scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 20:46:56 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	7a95847014	mutation_compactor: prepare for sstable compaction compact_mutation code is going to be shared among queries and sstable compaction. There are some differences though. Queries don't provide _max_purgeable and sstable compaction don't need any limits. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	00bcc05d36	mutation_compactor: _max_purgeable depends on the decorated key _max_perguable can be different for each partition, since it is computed using sstables in which that partition is present (or likely to be present). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	4133cc7a53	mutation_reader: make consume_flattened() produce decorated keys Since decorated keys are already computed it is better to pass more information than less. Consumers interested just in partition key can just drop token and the ones requiring full decorated key don't need to recompute it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:00 +01:00
Paweł Dziepak	fe4b739828	mutation_compactor: rename compact_for_query to compact_mutation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	3e86f9ab73	mutation_partition: extract compact_for_query to a separate header The compacting logic inside compact_for_query is going to be shared with sstable compaction. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	9b14c93677	streamed_mutation: return reference to decorated key Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	3c08ffb275	query: add full_slice query::full_slice is a partiton slice which has full clustering row ranges for all partition keys and no per-partition row limit. Options and columns are not set. It is used as a helper object in cases when a reference to partition_slice is needed but the user code needs just all data there is (an example of such case would be sstable compaction). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	599ed7f1ed	sstables: restore indentation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	e7ff20b3bb	sstables: run compaction code inside a thread Currently, each sstable write has its separate thread. However, the goal is to have compaction use consume_flattened() with a consumer that creates and writes the sstables. consume_flattened() needs to be executed inside a thread, since sstable writer may defer. This patch is a first step in preparations and it just makes whole compaction logic run inside a thread. That makes little sense now, since all sstable writes spawn their own threads but that's going to change in the following patches. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Gleb Natapov	8bf82cc31c	put additional info into cql timeout exception Fixes #1397 Message-Id: <20160628101829.GR14658@scylladb.com>	2016-06-30 12:03:48 +02:00
Paweł Dziepak	b70bf086b7	frozen_mutation: handle reversed streams properly Freezing streamed_mutations assumed that mutation fragments are streamed in the order they appear in the frozen mutation. That is not true for reversed streams. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1467277069-18702-1-git-send-email-pdziepak@scylladb.com>	2016-06-30 11:26:45 +02:00
Avi Kivity	9ac730dcc9	mutation_reader: make restricting_mutation_reader even more restricting While limiting the number of concurrently executing sstable readers reduces our memory load, the queued readers, although consuming a small amount of memory, can still grow without bounds. To limit the damage, add two limits on the queue: - a timeout, which is equal to the read timeout - a queue length limit, which is equal to 2% of the shard memory divided by an estimate of the queued request size (1kb) Together, these limits bound the amount of memory needed by queued disk requests in case the disk can't keep up. Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>	2016-06-29 15:17:35 +02:00
Raphael S. Carvalho	85cb2a6d35	database: trigger compaction on boot At the moment, we only trigger compaction after creating a new sstable as a result of memtable flush, or some other event such as changing compaction strategy of a column family. However, it's important to trigger compaction on boot too. That will happen after loading all column families. Fixes #1404. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>	2016-06-29 13:47:42 +03:00
Amnon Heiman	610fe274fd	services: Make scylla-jmx service depends on scylla-server The scylla-jmx no longer shutdown itself. A better setup would be that the it would be started when the scylla-server starts and that it would shutdown when the scylla-server shutdown. This patch do the scylla-server part of the change. The scylla-server definition would Want the scylla-jmx.service so there is no need to enable the scylla-jmx.service. A patch to the scylla-jmx would cause it to shutdown when the scylla-jmx shutsdown. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1467184502-4358-1-git-send-email-amnon@scylladb.com>	2016-06-29 11:36:04 +03:00
Avi Kivity	2c4501f317	Merge seastar upstream * seastar c15055c...d4d9e16 (4): > semaphore: switch to chunked_fifo > fair_queue: add missing include > chunked_fifo: implement back() > Chunked FIFO queue	2016-06-28 19:30:29 +03:00
Avi Kivity	1b448877d7	Merge " thrift: Implement CQL over thrift" from Duarte "This patchset implements the CQL over thrift verbs. Only CQL3 is supported, and the CQL2 verbs are disabled."	2016-06-28 13:36:12 +03:00
Piotr Jastrzebski	59d0d9e666	Fix cache_tracker::clear Make sure that artificial entries for all column families are set to non continuous. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f9e517fe40482c05f6c388faab7d6b9eca6b159e.1467103548.git.piotr@scylladb.com>	2016-06-28 11:18:23 +02:00
Piotr Jastrzebski	27575a0528	Fix previous_entry_is_continuous Rename it to check_previous_entry. Remove unnesessary test. Make sure ring_position always has working relation_to_keys method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>	2016-06-28 10:27:08 +02:00
Piotr Jastrzebski	68e5a199e9	Clean continuous flag of cache entry preceeding invalidated decorated key even when it's not found. Add test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>	2016-06-28 10:26:02 +02:00
Piotr Jastrzebski	cd9f3f94c4	Fix row_cache::update Clear continuous flag on the last cache entry with key smaller than a partition being dropped from memtable on flush and not saved in cache. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>	2016-06-28 10:25:38 +02:00
Piotr Jastrzebski	eb959a8b81	Change check for artificial entry in cache_entry destructor from _key.has_key() to _lru_link.is_linked() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>	2016-06-28 10:24:29 +02:00
Nadav Har'El	164c760324	Switch compression chunk default from 64 KB to 4 KB Following Cassandra, our default sstable compression chunk size is 64 KB. The big downside of this default size is that small reads need to read and uncompress a large chunk, around 32 KB (if compression halves the data size). In this patch we switch the default chunk size to 4 KB, which allows faster small reads (the report in issue #1337 was of a 60-fold speedup...). Since commit `2f56577`, large reads will not be signficantly slowed down by changing to a small chunk size. The remaining potential downside of this change is lowering of the compression ratio because of the smaller chunks individually compressed. However, experimentation shows that the compression ratio is hurt somewhat, but not dramatically, by lowering the chunk size: A recent survey of Cassandra compression in https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/ reports a compression ratio of 2 for 64 KB chunks, vs. 1.75 for 4 KB chunks. My own test on a cassandra-stress workload (whose data is relatively hard to compress), showed compression ratio 1.25 for 64 KB chunk, vs. 1.23 for 4 KB chunks. Also remember that if a user wants to control the chunk length for a particular table, he can - the 64 KB or 4 KB sizes are just the default. Fixes #1337 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1467063335-12096-1-git-send-email-nyh@scylladb.com>	2016-06-28 08:50:24 +03:00
Tomasz Grabiec	6108d91362	scylla-gdb: Introduce scylla ptr Helps in identifying pointers allocated through seastar allocator. Shows to which thread the pointer belongs, to which size class, whether it's live or free, what's the offset realtive to the live object. Example: (gdb) scylla ptr 0x6040abe88170 thread 1, small (size <= 320), live (0x6040abe88140 +48) Message-Id: <1467047215-1763-1-git-send-email-tgrabiec@scylladb.com>	2016-06-27 20:11:56 +03:00
Avi Kivity	22ec25b1b3	Merge seastar upstream * seastar 3029ebe...c15055c (5): > memory: add option to mlock() all memory > reactor: run idle poll handler with a pure poll function > ignore all but one failed futures in map_reduce > tutorial: more general exception printout on startup > resource: don't abort on too-high io queue count Fixes #1395. Fixes #1400.	2016-06-27 19:24:04 +03:00
Tomasz Grabiec	85a37cb379	Merge tag '1398/v3' from https://github.com/avikivity/scylla From Avi: Both the cql binary transport and the rpc server have protection against too many concurrent requests overwhelming the database due to transient allocations. There work by estimating the amount of memory a request requires, and accounting that against a semaphore. When the semaphore blocks, we stop dequeing requests from the tcp connection. Unfortunately, this doesn't work for reads, because we can't estimate the required memory size. A small read request can require many sstables to be read, perhaps concurrently, and a large response to be generated. Fix by limiting the number of concurrent reads in a shard to 100. This is more than enough concurrency for any reasonable disk, and there is no network communication at this level, so we're safe from high network latency requiring high concurrency. Fixes #1398.	2016-06-27 18:04:33 +02:00
Avi Kivity	f03cd6e913	db: add statistics about queued reads	2016-06-27 17:25:08 +03:00
Avi Kivity	edeef03b34	db: restrict replica read concurrency Since reading mutations can consume a large amount of memory, which, moreover, is not predicatable at the time the read is initiated, restrict the number of reads to 100 per shard. This is more than enough to saturate the disk, and hopefully enough to prevent allocation failures. Restriction is applied in column_family::make_sstable_reader(), which is called either on a cache miss or if the cache is disabled. This allows cached reads to proceed without restriction, since their memory usage is supposedly low. Reads from the system keyspace use a separate semaphore, to prevent user reads from blocking system reads. Perhaps we should select the semaphore based on the source of the read rather than the keyspace, but for now using the keyspace is sufficient.	2016-06-27 17:17:56 +03:00
Avi Kivity	bea7d7ee94	mutation_reader: introduce restricting_reader A restricting_reader wraps a mutation_reader, and restricts it concurrency using a provided semaphore; this allows controlling read concurrency, which is important since reads can consume a lot of resources ((number of participating sstables) * 128k after we have streaming mutations, and a lot more before).	2016-06-27 17:17:52 +03:00
Duarte Nunes	d31b52a07b	thrift: Disable CQL2 verbs And make set_cql_version a no-op. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:39:33 +02:00
Duarte Nunes	60094f4033	thrift: Implement execute_prepared_cql3_query verb Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:39:28 +02:00
Duarte Nunes	96068084ca	thrift: Implement prepare_cql3_query verb Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:39:22 +02:00
Duarte Nunes	c8afb4cc46	query_processor: Support thrift prepared statements This patch adds support for thrift prepared statements. It specializes the result_message::prepared into two types: result_message::prepared::cql and result_message::prepared::thrift, as their identifiers have different types. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:39:02 +02:00
Paweł Dziepak	1addbb9c1d	thrift: implement execute_cql3_query Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:38:52 +02:00
Duarte Nunes	2e7cb32601	query_options: Adjust value_views after prepare() query_options::prepare() changes the values array, but this is not the one used by query_options internally (e.g., in get_value_at). So we need to also recalculate the value_views after prepare() is called. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Duarte Nunes	2683a49c69	query_options: Remove value_views arg from ctor Having both the values and value_views arguments in the query_options ctor is confusing, since query_options uses only the value_views field but that is not communicated to the caller. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Duarte Nunes	62cfc4ab55	thrift: Add with_exn_cob helper function Similarly to the with_cob functions, this one takes the exn_cob function and ensures it is called in case of an exception. This is useful when the return type of the thrift verb is not nothrow move constructible; by holding on to the cob inside the verb and calling it directly when we have the result we avoid having to wrap it in a smart pointer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Duarte Nunes	b74ee6fdea	thrift: Add consistency level conversion Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Paweł Dziepak	0c441378f2	client_state: support thrift clients Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Paweł Dziepak	002d2bc353	thrift: pass query_processor to the thrift handler Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Duarte Nunes	225c5be78e	thrift: Add query_state to thrift_handler This patch adds a query_state object to the thrift handler, as it is required for CQL3 operations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:24:27 +02:00
Avi Kivity	f96e5d7c1b	managed_bytes: fix build with gcc 6 gcc 6 complains that deleting a managed_bytes::external isn't defined because the size isn't known. I'm not sure it's correct, but there's no way to tell because flexible arrays aren't standardized. Fix by using an array of zero size. Message-Id: <1466715187-4125-1-git-send-email-avi@scylladb.com>	2016-06-27 10:54:10 +02:00
Avi Kivity	056b427855	range_tombstone_list: use non-template lambda for cloning tombstones Using a template lambda invokes a bug in Fedora 24's boost where the lambda's parameter is an internal boost type rather than a range_tombestone. Constraining the parameter with an explicit type avoids the problem. Message-Id: <1466844211-17298-1-git-send-email-avi@scylladb.com>	2016-06-27 10:48:59 +02:00
Amnon Heiman	a439a6b8d3	API: Add the collectd enable/disable implementation This adds the implementation to the enable and disable of the collectd metrics. An example for disabling all collectd metrics that has write in their type_instance part: curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://localhost:10000/collectd/.?instance=.&type=.&type_instance=.write.&enable=false" After that a call to: curl -X GET "http://localhost:10000/collectd/" Would return those metrics with the enable set to "false" An example to enable all the metrics in cache that their type starts with byt: curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://localhost:10000/collectd/cache?type=byt.&enable=true" Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1466932139-19264-3-git-send-email-amnon@scylladb.com>	2016-06-26 12:26:50 +03:00
Amnon Heiman	4d7837af40	API Definition: collectd to support enable disable This adds to the definition of the collectd API the ability to turn on and off specific collectd metrics. For the GET end point a POST option was added that allow to enable or disable a metric. The general GET endpoint now returns the enable flag that indicates if the metric is enable. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1466932139-19264-2-git-send-email-amnon@scylladb.com>	2016-06-26 12:26:48 +03:00
Duarte Nunes	dfbf68cd24	commitlog: Define operator<< in namespace db Needed for compilation with gcc6. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>	2016-06-26 10:05:28 +03:00
Avi Kivity	5b81448ed6	main: add scylla --version option Fixes #1384. Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>	2016-06-23 16:24:03 +02:00
Duarte Nunes	1ffae6e6ee	database_test: Add test case for row limit This patch introduces database_test and adds a test case to ensure the row limit is respected when querying multiple partition ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20160623111723.17523-1-duarte@scylladb.com>	2016-06-23 14:20:34 +02:00
Avi Kivity	e647ec1c4a	Merge "thrift: Implement describe verbs" from Duarte "This patchset implements the thrib describe verbs: - describe_keyspace - describe_keyspaces - describe_cluster_name - describe_version - describe_ring - describe_local_ring - describe_token_map - describe_partitioner - describe_snitch - describe_schema_versions The verbs describe_splits and describe_splits_ex are not implemented because they are marked as experimentail (Origin's thrift interface has this to say about them: "experimental API for hadoop/parallel query support. may change violently and without warning."). Some drivers have moved away from depending on this verb (SPARKC-94). The correct way to implement the verbs for us would be to use the size_estimates system table (CASSANDRA-7688). However, we currently don't populate size_estimates, which is done by SizeEstimatesRecorder.java in Origin."	2016-06-23 13:30:39 +03:00
Duarte Nunes	b291c22e39	thrift: Complete describe_keyspace verb This patch completes the describe_keyspace verb by adding setting the remaining fields. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-23 12:02:47 +02:00
Duarte Nunes	febc48166d	thrift: Type name is already based on Origin This patch removes a conversion function from an internal type name to Origin's naming, which isn't needed because the abstract_type hierarchy already keeps that mapping. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-23 12:02:47 +02:00

1 2 3 4 5 ...

9736 Commits