scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	2db8626dbf	database: Ignore spaces in initial_token list Currently we get boost::lexical_cast on startup if inital_token has a list which contains spaces after commas, e.g.: initial_token: -1100081313741479381, -1104041856484663086, ... Fixes #1664. Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `a498da1987`)	2016-09-14 12:03:41 +03:00
Avi Kivity	e296fef581	Fix bad backport (`259b2592d4`)	2016-07-15 14:18:50 +03:00
Avi Kivity	5ee6a00b0f	db: don't over-allocate memory for mutation_reader column_family::make_reader() doesn't deal with sstables directly, so it doesn't need to reserve memory for them. Fixes #1453. Message-Id: <1468429143-4354-1-git-send-email-avi@scylladb.com> (cherry picked from commit `d3c87975b0`)	2016-07-15 14:11:01 +03:00
Avi Kivity	64df5f3f38	db: estimate queued read size more conservatively There are plenty of continuations involved, so don't assume it fits in 1k. Message-Id: <1468429516-4591-1-git-send-email-avi@scylladb.com> (cherry picked from commit `23edc1861a`)	2016-07-15 14:09:47 +03:00
Avi Kivity	259b2592d4	db: do not create column family directories belonging to foreign keyspaces Currently, for any column family, we create a directory for it in all keyspace directories. This is incredibly awkward. Fix by iterating over just the keyspace's column families, not all column families in existence. Fixes #1457. Message-Id: <1468495182-18424-1-git-send-email-avi@scylladb.com> (cherry picked from commit `1048e1071b`)	2016-07-15 14:08:46 +03:00
Avi Kivity	8547f34d60	mutation_reader: make restricting_mutation_reader even more restricting While limiting the number of concurrently executing sstable readers reduces our memory load, the queued readers, although consuming a small amount of memory, can still grow without bounds. To limit the damage, add two limits on the queue: - a timeout, which is equal to the read timeout - a queue length limit, which is equal to 2% of the shard memory divided by an estimate of the queued request size (1kb) Together, these limits bound the amount of memory needed by queued disk requests in case the disk can't keep up. Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com> (cherry picked from commit `9ac730dcc9`)	2016-06-29 17:29:00 +03:00
Avi Kivity	00692d891e	db: add statistics about queued reads Fixes #1398. (cherry picked from commit `f03cd6e913`)	2016-06-27 19:43:16 +03:00
Avi Kivity	94aa879d19	db: restrict replica read concurrency Since reading mutations can consume a large amount of memory, which, moreover, is not predicatable at the time the read is initiated, restrict the number of reads to 100 per shard. This is more than enough to saturate the disk, and hopefully enough to prevent allocation failures. Restriction is applied in column_family::make_sstable_reader(), which is called either on a cache miss or if the cache is disabled. This allows cached reads to proceed without restriction, since their memory usage is supposedly low. Reads from the system keyspace use a separate semaphore, to prevent user reads from blocking system reads. Perhaps we should select the semaphore based on the source of the read rather than the keyspace, but for now using the keyspace is sufficient. Fixes #1398. (cherry picked from commit `edeef03b34`)	2016-06-27 19:43:07 +03:00
Duarte Nunes	ffeef2f072	database: Actually decrease query_state limit query_state expects the current row limit to be updated so it can be enforced across partition ranges. A regression introduced in `e4e8acc946` prevented that from happening by passing a copy of the limit to querying_reader. This patch fixes the issue by having column_family::query update the limit as it processes partitions from the querying_reader. Fixes #1338 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1465804012-30535-1-git-send-email-duarte@scylladb.com> (cherry picked from commit `c896309383`)	2016-06-21 10:03:22 +03:00
Nadav Har'El	ad50d83302	Rewriting shared sstables only after all shards loaded sstables After commit `faa4581`, each shard only starts splitting its shared sstables after opening all sstables. This was important because compaction needs to be aware of all sstables. However, another bug remained: If one shard finishes loading its sstables and starts the splitting compactions, and in parallel a different shard is still opening sstables - the second shard might find a half-written sstable being written by the first shard, and abort on a malformed sstable. So in this patch we start the shared sstable rewrites - on all shards - only after all shards finished loading their sstables. Doing this is easy, because main.cc already contains a list of sequential steps where each uses invoke_on_all() to make sure the step completes on all shards before continuing to the next step. Fixes #1371 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com> (cherry picked from commit `3372052d48`)	2016-06-20 18:20:01 +03:00
Nadav Har'El	dececbc0b9	Rewrite shared sstables only after entire CF is read Starting in commit `721f7d1d4f`, we start "rewriting" a shared sstable (i.e., splitting it into individual shards) as soon as it is loaded in each shard. However as discovered in issue #1366, this is too soon: Our compaction process relies in several places that compaction is only done after all the sstables of the same CF have been loaded. One example is that we need to know the content of the other sstables to decide which tombstones we can expire (this is issue #1366). Another example is that we use the last generation number we are aware of to decide the number of the next compaction output - and this is wrong before we saw all sstables. So with this patch, while loading sstables we only make a list of shared sstables which need to be rewritten - and the actual rewrite is only started when we finish reading all the sstables for this CF. We need to do this in two cases: reboot (when we load all the existing sstables we find on disk), and nodetool referesh (when we import a set of new sstables). Fixes #1366. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com> (cherry picked from commit `faa45812b2`)	2016-06-19 17:11:14 +03:00
Nadav Har'El	0a2d4204bd	Rewrite shared sstables soon after startup Several shards may share the same sstable - e.g., when re-starting scylla with a different number of shards, or when importing sstables from an external source. Sharing an sstable is fine, but it can result in excessive disk space use because the shared sstable cannot be deleted until all the shards using it have finished compacting it. Normally, we have no idea when the shards will decide to compact these sstables - e.g., with size- tiered-compaction a large sstable will take a long time until we decide to compact it. So what this patch does is to initiate compaction of the shared sstables - on each shard using it - so that a soon as possible after the restart, we will have the original sstable is split into separate sstables per shard, and the original sstable can be deleted. If several sstables are shared, we serialize this compaction process so that each shard only rewrites one sstable at a time. Regular compactions may happen in parallel, but they will not not be able to choose any of the shared sstables because those are already marked as being compacted. Commit `3f2286d0` increased the need for this patch, because since that commit, if we don't delete the shared sstable, we also cannot delete additional sstables which the different shards compacted with it. For one scylla user, this resulted in so much excessive disk space use, that it literally filled the whole disk. After this patch commit `3f2286d0`, or the discussion in issue #1318 on how to improve it, is no longer necessary, because we will never compact a shared sstable together with any other sstable - as explained above, the shared sstables are marked as "being compacted" so the regular compactions will avoid them. Fixes #1314. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `721f7d1d4f`)	2016-06-16 14:01:33 +03:00
Tomasz Grabiec	74b8f63e8f	row_cache: Make stronger guarantees in clear/invalidate Correctness of current uses of clear() and invalidate() relies on fact that cache is not populated using readers created before invalidation. Sstables are first modified and then cache is invalidated. This is not guaranteed by current implementation though. As pointed out by Avi, a populating read may race with the call to clear(). If that read started before clear() and completed after it, the cache may be populated with data which does not correspond to the new sstable set. To provide such guarantee, invalidate() variants were adjusted to synchronize using _populate_phaser, similarly like row_cache::update() does. (cherry picked from commit `170a214628`) Conflicts: database.cc	2016-06-16 14:01:33 +03:00
Glauber Costa	30d54cef38	database: add a comment explaining the choice of function in CF stop We have recently commited a fix to a broken streaming bug that involved reverting column_family::stop() back to calling the custom seal functions explicitly for both memtables and streaming memtables. We here add a comment to explain why that had to be done. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <fe94b5883e9c29adc7fc9ee9f498894c057e7b64.1464293167.git.glauber@scylladb.com>	2016-05-29 11:28:15 +03:00
Glauber Costa	46f60f52d9	database: do not use implicitly stated seal function when closing the CF In commit `4981362f57`, I have introduced a regression that was thankfully caught by our dtest infrastructure. That patch is a preparation patch for the active reclaim patchset that is to come, and it consolidated all the flushes using the memtable_list's seal_fn function instead of calling the seal function explicitly. The problem here is that the streaming memtables have the delayed mechanism, about which the memtable_list is unaware. Calling memtable_list's seal_active_memtable() for the streaming memtables calls the delayed version, that does not guarantee flush. If we're lucky, we will indeed flush after the timer expires, but if we're not we'll just stop the CF with data not flushed. There are two options to fix this: the first is to teach the memtable_list about the delayed/forced mechanism, and the second is to just call the correct function explicitly during shutdown, and then when the time comes to add continuations to the result of the seal, add them here as well. Although the second option involves a bit more work and duplication, I think it is better in the sense that the delayed / forced mechanism really is something that belong to the streaming only. Being this the only user, I don't think it justifies complicating the memtable_list with this concept. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <b26017c825ccf585f39f58c4ab3787d78e551f5f.1464126884.git.glauber@scylladb.com>	2016-05-25 08:21:24 +03:00
Pekka Enberg	ceb29f9d32	Merge "Introduce upload dir for sstable migration" from Raphael "This change is intended to make migration process safer and easier. All column families will now have a directory called upload. With this feature, users may choose to copy migrated sstables to upload directory of respective column families, and run 'nodetool refresh'. That's supposed to be the preferred option from now on."	2016-05-24 16:36:47 +03:00
Avi Kivity	9637c2232c	Merge "Move the JMX timer polling logic to Scylla" from Amnon	2016-05-24 13:07:52 +03:00
Raphael S. Carvalho	c2fa3b796d	db: fix read consistency after refresh If sstable loaded by refresh covers a row that is cached by the column family, read query may fail to return consistent data. What we should do is to clear cache for the column family being loaded with new sstables. Fixes #1212. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <a08c9885a5ceb0b2991e40337acf5b7679580a66.1464072720.git.raphaelsc@scylladb.com>	2016-05-24 12:11:41 +03:00
Raphael S. Carvalho	e5f0314afd	db: introduce upload directory for sstable migration This change is intended to make migration process safer and easier. All column families will now have a directory called upload. With this feature, users may choose to copy migrated sstables to upload directory of respective column families, and call 'nodetool refresh'. That's supposed to be the preferred option from now on. For each sstable in upload directory, refresh will do the following: 1) Mutate sstable level to 0. 2) Create hard links to its components in column family dir, using a new generation. We make it safe by creating a hard link to temporary TOC first. 3) Remove all of its components in upload directory. This new code runs after refresh checked for new sstables in the column family directory. Otherwise, we could have a generation conflict. Unlike the first step, this new step runs with sstable write enabled. It's easier here because we know exactly which sstables are new. After that, refresh will load new sstables found in column family and upload directories. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-20 17:26:21 -03:00
Raphael S. Carvalho	74c8a87777	sstables: fix statistics rewrite It's not working because it tries to overwrite existing statistics file with exclusive flag. It's fixed by writing new statistics into temporary file and renaming it into place. If Scylla failed in middle of rewrite, a temporary file is left over. So boot code was adjusted to delete a temporary file created by this rewrite procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-20 17:24:15 -03:00
Raphael S. Carvalho	ee0f66eef6	db: fix migration of sstables with level greater than 0 Refresh will rewrite statistics of any migrated sstable with level > 0. However, this operation is currently not working because O_EXCL flag is used, meaning that create will fail. It turns out that we don't actually need to change on-disk level of a sstable by overwriting statistics file. We can only set in-memory level of a sstable to 0. If Scylla reboots before all migrated sstables are compacted, leveled strategy is smart enough to detect sstables that overlap, and set their in-memory level to 0. Fixes #1124. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-05-17 11:08:08 -03:00
Amnon Heiman	750f30cf07	column_family: Change histogram to timed_rate_moving_average_and_histogram As part of moving the derived statistic in to scylla, this replaces the histogram object in the column_family to timed_rate_moving_average_and_histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-05-17 11:53:15 +03:00
Glauber Costa	17b9203719	database: invert order of elements So that the sizes of the region can be initialized first Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <dc3df186a977b492d83c0a397f206c2db940aa37.1463448522.git.glauber@scylladb.com>	2016-05-17 11:28:39 +03:00
Glauber Costa	2ff6d38d0c	database: use a single constructor for the column family We've been keeping two constructors for the column family to allow for a version without the commitlog. But it's by now quite complicated to maintain the two, because changes always have to be made in two places. This patch adds a private constructor that does the actual construction, and have the public constructors to call it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <dd3cb0b9c20ad154a6131bad6ece619f70ed5025.1463448522.git.glauber@scylladb.com>	2016-05-17 11:28:39 +03:00
Glauber Costa	8fede5b98e	memtables: isolate logic for disk writes disabled When we have disk writes disabled, we exit immediately from the flush function. We can just encode that separately and pass a different function in the memtable_list creation. That simplifies the memtable flush a bit. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <908e3b5eb2c6ee84b8ad7b31c3673be5531a087c.1463448522.git.glauber@scylladb.com>	2016-05-17 11:28:38 +03:00
Glauber Costa	4981362f57	memtables: always seal through memtable_list seal function I would like to be able to apply a function at the end of every flush, that is common for both memtables and streaming memtables. For instance, to unthrottle current waiters. Right now some calls to seal_active_memtable are open coded, calling the column family's function directly, for both the main memtable list and the streaming list. This patch moves all the current open code callers to call the respective memtable_list function. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <0c780254f3c4eb03e2bcd856b83941cf49a84b85.1463448522.git.glauber@scylladb.com>	2016-05-17 11:28:37 +03:00
Piotr Jastrzebski	dcba6f5c45	Pass clustering_row_ranges to mutation readers. This will allow readers to reduce the amount of data read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 14:36:57 +02:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Piotr Jastrzebski	484d2ecd0a	Slice data with clustering key range in sstable reader Add additional parameters to mp_row_consumer to be able to fetch only cells for given clustering key ranges This will be used in row_cache when it will work on clustering key level instead of partition key level. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:30 +02:00
Pekka Enberg	d93d46e721	Merge "ALTER KEYSPACE" from Calle "Implementation of ALTER KEYSPACE. Fixes #429"	2016-05-10 22:07:06 +03:00
Piotr Jastrzebski	240a185727	Stop scanning keyspace data directory when populating. Iterate over column families and check/create directories for them instead of scanning keyspace data directory and filtering directories against column families that exist in system tables for this keyspace. Fixes #1008 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <26da66eec67a1ab1318917a66161915cdef924ab.1462890592.git.piotr@scylladb.com>	2016-05-10 17:35:55 +03:00
Calle Wilund	6ef7885ae3	database: Implement update_keyspace Reloads keyspace metadata and replaces in existing keyspace. Note: since keyspace metadata, and consequently, replication strategy now becomes volatile, keyspace::metadata now returns shared pointer by value (i.e. keep-alive). Replication strategy should receive the same treatment, but since it is extensively used, but never kept across a continuation, I've just added a comment for now.	2016-05-10 14:31:30 +00:00
Avi Kivity	80302d98dd	database: silence atomic deletion cancellation logs during compaction Those logs are expected during shutdown.	2016-05-07 20:37:48 +03:00
Raphael S. Carvalho	5aeeb0b3e8	compaction: add support to parallel compaction on the same column family It was noticed that small sstables will accumulate for a column family because scylla was limited to two compaction per shard, and a column family could have at most one compaction running at a given shard. With the number of sstables increasing rapidly, read performance is degraded. At the moment, our compaction manager works by running two compaction task handlers that run in parallel to the rest of the system. Each task handler gets to run when needed, gets a column family from compaction manager queue, runs compaction on it, and goes to sleep again. That's basically its cycle. Compaction manager only allows one instance of a column family to be on its queue, meaning that it's impossible for a column family to be compacted in parallel. One compaction starts after another for a given column family. To solve the problem described, we want to concurrently run compaction jobs of a column family that have different "size tier" (or "weight"). For those unfamiliar, compaction job contains a list of sstables that will be compacted together. The "size tier" of a compaction job is the log of the total size of the input sstables. So a compaction job only gets to run if its "size tier" is not the same of an ongoing compaction. There is no point in compacting concurrently at the same "size tier", because that slows down both compactions. We will no longer queue column families in compaction manager. Instead, we create a new fiber to run compaction on demand. This fiber that runs asynchronously will do the following: 1) Get a compaction job from compaction strategy. 2) Calculate "size tier" of compaction job. 3) Run compaction job if its "size tier" is not the same of an ongoing compaction for the given column family. As before, it may decide to re-compact a column family based on a stat stored in column family object. Ran all compaction-related dtests. Fixes #1216. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d30952ff136192a522bde4351926130addec8852.1462311908.git.raphaelsc@scylladb.com>	2016-05-04 11:46:09 +03:00
Calle Wilund	9130b0de16	database.cc: Fix compilation error with boost 1.55 Message-Id: <1461067254-526-1-git-send-email-calle@scylladb.com>	2016-04-25 12:54:43 +03:00
Pekka Enberg	f6da9bc92b	Merge "Additional mutations/queries related collectd metrics" from Vlad "This series introduces some additional metrics (mostly) in a storage_proxy and a database level that are meant to create a better picture of how data flows in the cluster. First of all where possible counters of each category (e.g. total writes in the storage proxy level) are split into the following categories: - operations performed on a local Node - operations performed on remote Nodes aggregated per DC In a storage_proxy level there are the following metrics that have this "split" nature (all on a sending side): - total writes (attempts/errors) - writes performed as a result of a Read Repair logic - total data reads (attempts/completed/errors) - total digest reads (attempts/completed/errors) - total mutations data reads (attempts/completed/errors) In a batchlog_manager: - writes performed as a result of a batchlog replay logic Thereby if for instance somebody wants to get an idea of how many writes the current Node performs due to user requested mutations only he/she has to take a counter of total writes and subtract the writes resulted by Read Repairs and batchlog replays. On a receiving side of a storage_proxy we add the two following counters: - total number of received mutations - total number of forwarded mutations (attempts/errors) In order to get a better picture of what is going on on a local Node we are adding two counters on a database level: - total number of writes - total number of reads Comparing these to total writes/reads in a storage_proxy may give a good idea if there is an excessive access to a local DB for example."	2016-04-21 15:58:45 +03:00
Vlad Zolotarov	97e5bfa815	database: add metrics for total writes and reads This patch adds a counter of total writes and reads for each shard. It seems that nothing ensures that all database queries are ready before database object is destroyed. Make _stats lw_shared_ptr in order to ensure that the object is alive when lambda gets to incrementing it. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:53 +03:00
Duarte Nunes	c7b3a4b144	udt: Parse user types system table This patch loads and parses the user types system table during bootstrap. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Tomasz Grabiec	45527fcffa	Merge branch 'glommer/issue-1144-v5' From Glauber: There are current some outstanding issues with the throttling code. It's easier to see them with the streaming code, but at least one of them is general. One of them is related to situations in which the amount of memory available leaves only one memtable fitting in memory. That would only happen with the general code if we set the memtable cleanup threshold to 100 % - and I don't even know if it is valid - but will happen quite often with the streaming code. If that happens, we'll start throttling when that memtable is being written, but won't be able to put anything else in its place - leading to unnecessary throttling. The second, and more serious, happens when we start throttling and the amount of available memory is not at least 1MB. This can deadlock the database in the sense that it will prevent any request from continuing, and in turn causing a flush due to memtable size. It is a good practice anyway to always guarantee progress. Fixes #1144	2016-04-18 12:20:13 +02:00
Glauber Costa	9c87ae3496	throttle: always release at least one request if we are below the limit Our current throttling code releases one requests per 1MB of memory available that we have. If we are below the memory limit, but not by 1MB or more, then we will keep getting to unthrottle, but never really do anything. If another memtable is close to the flushing point, those requests may be exactly the ones that would make it flush. Without them, we'll freeze the database. In general, we need to always release at least one request to make sure that progress is always achieved. This fixes #1144 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 13:13:15 -04:00
Glauber Costa	2c5dfe08c1	memtable_list: make sure at least two memtables are available This is usually not a problem for the main memtable list - although it can be, depending on settings, but shows up easily for the streaming memtables list. We would like to have at least two memtables, even if we have to cut it short. If we don't do that, one memtable will have use all available memory and we'll force throttling until the memtable gets totally flushed. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Glauber Costa	1daede7396	unnest throttle_state throttle_state is currently a nested member of database, but there is no particular reason - aside from the fact that it is currently only ever referenced by the database for us to do so. We'll soon want to have some interaction between this and the column family, to allow us to flush during throttle. To make that easier, let's unnest it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Glauber Costa	39def369ce	move information about memtables' region group inside memtable list This is a preparation patch so we can move the throttling infrastructure inside the memtable_list. To do that, the region group will have to be passed to the throttler so let's just go ahead and store it. In consequence of that, all that the CF has to tell us is what is the current schema - no longer how to create a new memtable. Also, with a new parameter to be passed to the memtable_list the creation code gets quite big and hard to follow. So let's move the creation functions to a helper. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Avi Kivity	a843aea547	db: delete compacted sstables atomically If sstables A, B are compacted, A and B must be deleted atomically. Otherwise, if A has data that is covered by a tombstone in B, and that tombstone is deleted, and if B is deleted while A is not, then the data in A is resurrected. Fixes #1181.	2016-04-14 17:14:26 +03:00
Paweł Dziepak	2db70cf912	database: remove throw() specifiers Most of them are missing std::bad_alloc (which leads to aborts) and they force the compiler to add unnecessary runtime checks. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:52:13 +01:00
Glauber Costa	8453ff7788	make get_sstable_key_range an instance method Because just creating an SSTable object does not generate any I/O, get_sstable_key_range should be an instance method. The main advantage of doing that is that we won't have to read the summary twice. The way we're doing it currently, if happens to be a shard-relevant table we'll call load() - which reads the summary again. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-08 17:14:29 -04:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	f15c380a4f	database: Compact mutations when executing data queries Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165.	2016-04-07 19:56:58 +02:00
Calle Wilund	ff5df306e3	database: Use disk-marking delete function in discard_sstables Fixes #797 To make sure an inopportune crash after truncate does not leave sstables on disk to be considered live, and thus resurrect data, after a truncate, use delete function that renames the TOC file to make sure we've marked sstables as dead on disk when we finish this discard call. Message-Id: <1458575440-505-2-git-send-email-calle@scylladb.com>	2016-03-24 12:02:08 +02:00

1 2 3 4 5 ...

560 Commits