scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	4ca7f0a491	thrift: add support for counter operations	2017-02-02 10:35:14 +00:00
Paweł Dziepak	fa29ef3cc0	cql3: allow counters in CREATE TABLE statements	2017-02-02 10:35:14 +00:00
Paweł Dziepak	fce6e0987f	cql3: selection: do not panic when seeing counters At this stage counters cells are already long_type values, so no special handling is necessary.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	1e8814f5ce	storage_proxy: support counter updates	2017-02-02 10:35:14 +00:00
Paweł Dziepak	c14c6b753b	storage_proxy: add get_live_endpoints()	2017-02-02 10:35:14 +00:00
Paweł Dziepak	d6ebf84edf	cql3: add counter increment and decrement operations	2017-02-02 10:35:14 +00:00
Paweł Dziepak	5a0955e89d	db: add operations for applying counter updates	2017-02-02 10:35:14 +00:00
Paweł Dziepak	8d889082bf	counters: implement transforming counter deltas to shards The leader receives counter updates as deltas which have to be transformed to counter shards. In order to do that, current local shard of the modified counter cell needs to be read, logical clock incremented and the value modified by the specified delta.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	55277b3182	add infrastructure for locking counter cells The leader receives counter update in a form of deltas which need to be transformed to counter shards. In order to do that the node needs to read its current state of the modified counter cells. Since this is essentially a read-modify-write opertation an appropriate locking mechanism is needed. Counter cell locker introduced in this patch uses a hashtable of partition entry each containing a hashtable of cell entries. Inside a cell entry there is a semaphore used for synchronization. Once no longer needed cell entries and partition entries are removed. In order to avoid deadlocks cell entries are always locked in the same order which is the lexicographical order of (clustering key, column id) pairs. Note that schema changes are not a difficulty since they do not make it possible to change ordering of such pairs.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	22fbb11f90	add fnv1a hasher	2017-02-02 10:35:14 +00:00
Paweł Dziepak	a16761dcb4	position_in_partition: add feed_hash()	2017-02-02 10:35:14 +00:00
Paweł Dziepak	f4fce93807	position_in_partition: add functions for querying object type	2017-02-02 10:35:14 +00:00
Paweł Dziepak	53d9a6f220	types: make counter_type_impl report its cql3_type	2017-02-02 10:35:14 +00:00
Paweł Dziepak	a805bea97a	transport: encode counters as long_type For the purposes of CQL counters are long values (either a delta in case of writes or the final value for reads).	2017-02-02 10:35:14 +00:00
Paweł Dziepak	b6564651e4	mutation_partition: make for_each_cell() accessible outside source file for_each_cell() const already can be used from any place in the code, allow the same with non-const version.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	bf60b7844b	messaging_service: add COUNTER_MUTATION verb This verb is going to be used for coordinator<->leader communication during counter updates.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	67ca6959bd	storage_service: add COUNTERS feature	2017-02-02 10:35:14 +00:00
Paweł Dziepak	9989239c97	idl: add idl description of consistency level	2017-02-02 10:35:14 +00:00
Paweł Dziepak	4b3c0db5cc	schema: make is_counter() return correct value	2017-02-02 10:35:14 +00:00
Paweł Dziepak	99b21fbb86	tests: random_mutation_generator: generate counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	de2acd47c9	tests/sstables: test reading and writing counters	2017-02-02 10:35:14 +00:00
Paweł Dziepak	83c6fc1114	sstables: write counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	de698105e4	tests/counter: test apply, difference and freeze	2017-02-02 10:35:14 +00:00
Paweł Dziepak	0c93d01232	atomic_cell: make sure upper level tombstones cover counters Support for deletion of counters is limited in a way that once deleted they cannot be used again (i.e. tombstone always wins, regardless of the timestamp). Logic responsible for merging two counter cells already makes sure that tombstones are handled properly, but it is also necessary to ensure that higher level tombstones always cover counters.	2017-02-02 10:35:14 +00:00
Paweł Dziepak	9f1ebd4f7c	idl/mutation: add counter serialisation logic	2017-02-02 10:35:14 +00:00
Paweł Dziepak	47d14906e6	mutation_partition: support querying counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	63f25eb12c	mutation_hasher: handle counter cells properly	2017-02-02 10:35:14 +00:00
Paweł Dziepak	25c8ed1c71	feed_hash: allow additional arguments	2017-02-02 10:35:14 +00:00
Paweł Dziepak	a57e86cc37	mutation_partition: compute counter difference	2017-02-02 10:35:13 +00:00
Paweł Dziepak	2725a4945d	mutation_partition: apply counter cells properly	2017-02-02 10:35:13 +00:00
Paweł Dziepak	496b42fcc7	tests: add test for counters	2017-02-02 10:35:13 +00:00
Paweł Dziepak	7bb5b49799	add in memory representation of counters Live counter cells are collections of shards, each one representing the sum of all operations performed by a particular replica. This commits introduces an in-memory representation of counters as well as basic operations such as merge, difference and hashing.	2017-02-02 10:35:13 +00:00
Paweł Dziepak	c66db213d3	storage_service: allow getting local host id without futures<>	2017-02-02 10:35:13 +00:00
Paweł Dziepak	0a8f00c159	atomic_cell: add flag for recognizing counter updates A counter cell may be either a collection of shards or just a delta. The former can only appear in certain places on coordinator and leader.	2017-02-02 10:35:13 +00:00
Paweł Dziepak	ab344c5aa3	mutation_partition_view: extract atomic_cell variant	2017-02-02 10:35:13 +00:00
Paweł Dziepak	83f6018ea2	schema: keep counter information in column definition	2017-02-02 10:35:13 +00:00
Avi Kivity	aec419da13	Merge seastar upstream * seastar c1dbd89...f07f8ed (3): > Merge "Introduce when_all_succeed()" from Paweł > tests: adjust collectd test for metric API change > Merge "DNS query support" from Calle	2017-02-02 12:30:10 +02:00
Piotr Jastrzebski	15cc8460bd	mutation_partition: make rows_entry constructors explicit All converting constructors should be explicit otherwise they can create a confusion. I got myself in such a situation when clustering key got implicitly converted into rows_entry when I was not expecting it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c3f19719760f6dc7cf5e858b9c452506faedf521.1485950529.git.piotr@scylladb.com>	2017-02-01 17:57:50 +01:00
Tomasz Grabiec	2fd339787b	tests: lsa: Add test for reclaimer starting and stopping	2017-02-01 17:41:56 +01:00
Tomasz Grabiec	f943296da0	tests: lsa: Add request releasing stress test	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	e40fb438f5	lsa: Avoid avalanche releasing of requests Before, the logic for releasing writes blocked on dirty worked like this: 1) When region group size changes and it is not under pressure and there are some requests blocked, then schedule request releasing task 2) request releasing task, if no pressure, runs one request and if there are still blocked requests, schedules next request releasing task If requests don't change the size of the region group, then either some request executes or there is a request releasing task scheduled. The amount of scheduled tasks is at most 1, there is a single thread of excution. However, if requests themselves would change the size of the group, then each such change would schedule yet another request releasing thread, growing the task queue size by one. The group size can also change when memory is reclaimed from the groups (e.g. when contains sparse segments). Compaction may start many request releasing threads due to group size updates. Such behavior is detrimental for performance and stability if there are a lot of blocked requests. This can happen on 1.5 even with modest concurrency becuase timed out requests stay in the queue. This is less likely on 1.6 where they are dropped from the queue. The releasing of tasks may start to dominate over other processes in the system. When the amount of scheduled tasks reaches 1000, polling stops and server becomes unresponsive until all of the released requests are done, which is either when they start to block on dirty memory again or run out of blocked requests. It may take a while to reach pressure condition after memtable flush if it brings virtual dirty much below the threshold, which is currently the case for workloads with overwrites producing sparse regions. Refs #2021. Fix by ensuring there is at most one request releasing thread at a time. There will be one releasing fiber per region group which is woken up when pressure is lifted. It executes blocked requests until pressure occurs. The logic for notification across hierachy was replaced by calling region_group::notify_relief() from region_group::update() on the broadest relieved group.	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	d55baa0cd1	lsa: Move definitions to .cc	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	8f8b111b33	lsa: Simplify hard pressure notification management The hard pressure was only signalled on region group when run_when_memory_available() was called after the pressure condition was met. So the following loop is always an infinite loop rather than stopping when engouh is allocated to cause pressure: while (!gr.under_pressure()) { region.allocate(...); } It's cleaner if pressure notification works not only if run_when_memory_available() is used but whenever conditino changes, like we do for the soft pressure. There is comment in run_when_memory_available() which gives reasons why notifications are called from there, but I think those reasons no longer hold: - we already notify on soft pressure conditions from update(), and if that is safe, notifying about hard pressure should also be safe. I checked and it looks safe to me. - avoiding notification in the rare case when we stopped writing right after crossing the threshold doesn't seem benefitial. It's unlikely in the first place, and one could argue it's better to actually flush now so that when writes resume they will not block.	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	9aa1be5d08	lsa: Do not start or stop reclaiming on hard pressure We already call these when crossing the soft threshold. We shouldn't stop reclaiming when hard pressure is gone because soft pressure may still be present. Calling start_reclaiming() on hard pressure is unnecessary because soft pressure also starts it, and when there is hard pressure there is also soft pressure.	2017-02-01 17:40:15 +01:00
Amnon Heiman	45b6070832	Merge seastar upstream * seastar 397685c...c1dbd89 (13): > lowres_clock: drop cache-line alignment for _timer > net/packet: add missing include > Merge "Adding histogram and description support" from Amnon > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&' > Set the option '--server' of tests/tcp_sctp_client to be required > core/memory: Remove superfluous assignment > core/memory: Remove dead code > core/reactor: Use logger instead of cerr > fix inverted logic in overprovision parameter > rpc: fix timeout checking condition > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier Includes treewide change to accomodate rpc changing its timeout clock to lowres_clock. Includes fixup from Amnon: collectd api should use the metrics getters As part of a preperation of the change in the metrics layer, this change the way the collectd api uses the metrics value to use the getters instead of calling the member directly. This will be important when the internal implementation will changed from union to variant. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>	2017-02-01 14:39:08 +02:00
Glauber Costa	facb0aa6d9	row_cache: rewrite loop so that debug mode doesn't become a noop need_preempt() is always true in debug mode. Because of that, this loop will never be executed. Rewrite it as a do-while loop so we are sure that it is executed at least once - or exactly once in debug mode. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1485913079-1283-1-git-send-email-glauber@scylladb.com>	2017-02-01 10:02:13 +02:00
Tomasz Grabiec	634761dbba	commitlog: Fix default limit for size on disk The per-node limit will be total memory divided by number of shards instead of just total memory. For example, when Scylla is started with -c16 -m16G, the commit log will induce flushes on given shard when unflushed data exceeds on that shard 62MB instead of 1GB. Fixes #2046. Message-Id: <1485874534-10939-1-git-send-email-tgrabiec@scylladb.com>	2017-01-31 17:12:59 +02:00
Piotr Jastrzebski	c7e95af0b0	row_cache_test: fix test_mvcc Currently the test does not wait for cache update to finish before carrying on with the checks. This makes the test nondeterministic and purely wrong because checks expect update to be finished. This patch changes the test to wait for update to finish. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2a99bba24b1628466d3495332b48ef3ccdb43c26.1485862389.git.piotr@scylladb.com>	2017-01-31 11:37:29 +00:00
Avi Kivity	aedb5e5cfa	mutation_fragment: add std::ostream support Helps poor debuggers. Message-Id: <20170130163605.4858-1-avi@scylladb.com>	2017-01-31 10:37:42 +01:00

... 8 9 10 11 12 ...

11716 Commits