scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Duarte Nunes	325f917d8a	system_keyspace: Correctly deal with wrapped ranges This patch ensures we correctly deal with ranges that wrap around when querying the size_estimates system table. Ref #693 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com> (cherry picked from commit `e0a43a82c6`)	2016-08-07 17:21:58 +03:00
Piotr Jastrzebski	ec3d59bf13	Add flag to configure max size of a cached partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `636a4acfd0`)	2016-07-27 14:09:34 +03:00
Avi Kivity	0523000af5	size_estimates_recorder: unwrap ranges before searching for sstables column_family::select_sstables() requires unwrapped ranges, so unwrap them. Fixes crash with Leveled Compaction Strategy. Fixes #1507. Reviewed-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1469563488-14869-1-git-send-email-avi@scylladb.com> (cherry picked from commit `64d0cf58ea`)	2016-07-27 10:07:13 +03:00
Duarte Nunes	aaa9b5ace8	system_keyspace: Add query_size_estimates() function The query_size_estimates() function queries the size_estimates system table for a given keyspace and table, filtering out the token ranges according to the specified tokens. Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `ecfa04da77`)	2016-07-25 13:43:16 +03:00
Duarte Nunes	8d491e9879	size_estimates_recorder: Fix stop() This patch fixes stop() by checking if the current CPU instead of whether the service is active (which it won't be at the time stop() is called). Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `d984cc30bf`)	2016-07-25 13:43:08 +03:00
Duarte Nunes	b63c9fb84b	system_keyspace: Avoid pointers in range_estimates This patch makes range_estimates a proper struct, where tokens are represented as dht::tokens rather than dht::ring_position*. We also pass other arguments to update_ and clear_size_estimates by copy, since one will already be required. Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `e16f3f2969`)	2016-07-25 13:42:53 +03:00
Tomasz Grabiec	35c1781913	schema_tables: Fix hang during keyspace drop Fixes #1484. We drop tables as part of keyspace drop. Table drop starts with creating a snapshot on all shards. All shards must use the same snapshot timestamp which, among other things, is part of the snapshot name. The timestamp is generated using supplied timestamp generating function (joinpoint object). The joinpoint object will wait for all shards to arrive and then generate and return the timestamp. However, we drop tables in parallel, using the same joinpoint instance. So joinpoint may be contacted by snapshotting shards of tables A and B concurrently, generating timestamp t1 for some shards of table A and some shards of table B. Later the remaining shards of table A will get a different timestamp. As a result, different shards may use different snapshot names for the same table. The snapshot creation will never complete because the sealing fiber waits for all shards to signal it, on the same name. The fix is to give each table a separate joinpoint instance. Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `5e8f0efc85`)	2016-07-22 15:36:45 +02:00
Pekka Enberg	e8cb163cdf	db/config: Start Thrift server by default We have Thrift support now so start the server by default. Message-Id: <1469002000-26767-1-git-send-email-penberg@scylladb.com> (cherry picked from commit `aff8cf319d`)	2016-07-20 11:29:24 +03:00
Tomasz Grabiec	9c430c2cff	schema_tables: Add more logging Message-Id: <1468917771-2592-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `a0832f08d2`)	2016-07-20 10:13:28 +03:00
Vlad Zolotarov	b36b69c1d6	service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	baa6496816	service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Duarte Nunes	9ffdf4a5cd	db: Implement size_estimates_recorder This patch implements the size_estimates_recorder, which periodically writes estimations for all the non-system column families in the size_estimates system table. The size_estimates_recorder class corresponds to the one in Cassandra's SizeEstimatesRecorder.java. Estimation is carried out by shard 0. Since we're estimating based on data in shared sstables, having multiple shards doing this would skew the results. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-19 09:44:58 +00:00
Duarte Nunes	f8f61cf246	system_keyspace: Record and clear size estimates This patch implements functions that allow the size_estimates system table to be updated and cleared. The size_estimates table is updated per schema with a set of token ranges and the associated estimations of how many partitions there are and their mean size. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-18 23:58:31 +00:00
Gleb Natapov	9cc076c9f3	storage_proxy: preserve endpoint's order while filtering local nodes for query filter_for_query() gets sorted by preference list of endpoints and should preserve that order after filtering out non local endpoints for local query. partition() does not guaranty this while stable_partition() does, so use it instead. Fixes #1450. Message-Id: <20160713100909.GM10767@scylladb.com>	2016-07-13 13:17:28 +03:00
Glauber Costa	73a70e6d0a	config: Use Scylla in user visible options We have imported most of our data about config options from Cassandra. Due to that, many options that mention the database by name are still using "Cassandra". Specially for the user visible options, which is something that a user sees, we should really be using Scylla here. This patch was created by automatically replacing every occurrence of "Cassandra" with "Scylla" and then later on discarding the ones in which the change didn't make sense (such as Unused options and mentions to the Cassandra documentation) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1423e1d7e36874a1f46bd091aec96dcb4d8482d9.1468267193.git.glauber@scylladb.com>	2016-07-12 09:18:17 +03:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Calle Wilund	4ab03e98cf	commitlog: Ensure we don't end up in a loop when we must wait for alloc Continuation reordering could cause us to repeatedly see the segment-local flag var even though actual write/sync ops are done. Can cause wild recursion without actual delayed continuation -> SOE. Fix by also checking queue status, since this is the wait object. Message-Id: <1468234873-13581-1-git-send-email-calle@scylladb.com>	2016-07-11 14:12:38 +03:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Asias He	f4389349e4	config: Enable partitioner option Enable --partitioner option so that user can choose partitioner other than the default Murmur3Partitioner. Currently, only Murmur3Partitioner and ByteOrderedPartitioner are supported. When non-supported partitioner is specifed, error will be propogated to user.	2016-07-08 17:44:55 +08:00
Glauber Costa	7169b727ea	move system tables to its own region In the spirit of what we are doing for the read semaphore, this patch moves system writes to its own dirty memory manager. Not only will it make sure that system tables will not be serialized by its own semaphore, but it will also put system tables in its own region group. Moving system tables to its own region group has the advantage that system requests won't be waiting during throttle behind a potentially big queue of user requests, since requests are tended to in FIFO order within the same region group. However, system tables being more controlled and predictable, we can actually go a step further and give them some extra reservation so they may not necessarily block even if under pressure (up to 10 MB more). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Avi Kivity	76cc6408cd	Merge "feature check for seed node" from Asias ""This series implemnts feature check for seed node.	2016-07-05 19:01:01 +03:00
Asias He	6f69963ef9	system_keyspace: Simplify load_host_ids implementation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	3f31be58b6	system_keyspace: Simplify load_tokens implemntation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	31df4e5316	system_keyspace: Introduce load_peer_features To get the peer features stored in the system.peers table.	2016-07-05 10:09:53 +08:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Avi Kivity	9ac730dcc9	mutation_reader: make restricting_mutation_reader even more restricting While limiting the number of concurrently executing sstable readers reduces our memory load, the queued readers, although consuming a small amount of memory, can still grow without bounds. To limit the damage, add two limits on the queue: - a timeout, which is equal to the read timeout - a queue length limit, which is equal to 2% of the shard memory divided by an estimate of the queued request size (1kb) Together, these limits bound the amount of memory needed by queued disk requests in case the disk can't keep up. Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>	2016-06-29 15:17:35 +02:00
Avi Kivity	edeef03b34	db: restrict replica read concurrency Since reading mutations can consume a large amount of memory, which, moreover, is not predicatable at the time the read is initiated, restrict the number of reads to 100 per shard. This is more than enough to saturate the disk, and hopefully enough to prevent allocation failures. Restriction is applied in column_family::make_sstable_reader(), which is called either on a cache miss or if the cache is disabled. This allows cached reads to proceed without restriction, since their memory usage is supposedly low. Reads from the system keyspace use a separate semaphore, to prevent user reads from blocking system reads. Perhaps we should select the semaphore based on the source of the read rather than the keyspace, but for now using the keyspace is sufficient.	2016-06-27 17:17:56 +03:00
Duarte Nunes	dfbf68cd24	commitlog: Define operator<< in namespace db Needed for compilation with gcc6. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>	2016-06-26 10:05:28 +03:00
Duarte Nunes	aacc7193f2	schema: Replace keyspace's schema_ptr on CF update This patch ensures we replace the schema_ptr held by its respective keyspace object when a column family is being updated. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20160623085710.26168-1-duarte@scylladb.com>	2016-06-23 11:11:52 +02:00
Calle Wilund	2b812a392a	commitlog_replayer: Fix calculation of global min pos per shard If a CF does not have any sstables at all, we should treat it as having a replay position of zero. However, since we also must deal with potential re-sharding, we cannot just set shard->uuid->zero initially, because we don't know what shards existed. Go through all CF:s post map-reduce, and for every shard where a CF does not have an RP-mapping (no sstables found), set the global min pos (for shard) to zero. Fixes #1372 Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>	2016-06-21 10:05:05 +03:00
Calle Wilund	88ffe60138	batchlog_manager: Change replay mutation CL to ALL Try to emulate the origin behaviour for batch reply. They use an explicit write handler, combinging 1.) Hinting to all known dead endpoints 2.) Sending to all persumed live, requiring ack from all 3.) Hinting to endpoint to which send failed. We don't have hints, so try to work around by doing send with cl=ALL, and if send fails (wholly or partially), retain the batch in the log. This is still slight behavioural difference, and we also risk filling up the batch log in extreme cases. (Though probably not in any real environment). Refs #1222 Message-Id: <1466444170-23797-1-git-send-email-calle@scylladb.com>	2016-06-21 09:41:09 +03:00
Calle Wilund	7cdea1b889	commitlog: Use flush queue for write/flush ordering, improve batch Using an ordering mechanism better than rw-locks for write/flush means we can wait for pending write in batch mode, and coalesce data from more than one mutation into a chunk. It also means we can wait for a specific read+flush pair (based on file position). Downside is that we will not do parallel writes in batch mode (unless we run out of buffer), which might underutilize the disk bandwidth. Upside is that running in batch mode (i.e. per-write consistency) now has way better bandwidth, and also, at least with high mutation rate, better average latency. Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>	2016-06-20 13:09:16 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Gleb Natapov	9635e67a84	config: adjust boost::program_options validator to work with db::string_map Fixes #1320 Message-Id: <20160607064511.GX9939@scylladb.com>	2016-06-07 10:42:27 +03:00
Gleb Natapov	9132604a90	config: make string_map to be a unique type instead of an alias to unordered_map Config provides operators << >> for string_map which makes it impossible to have generic stream operators for unordered_map. Fix it by making string_map a separate type and not just an alias. Message-Id: <20160602102642.GJ9939@scylladb.com>	2016-06-02 13:28:40 +03:00
Gleb Natapov	1476becd28	config: put operators << and >> into db namespace Makes ADL find the right version of the overload. Message-Id: <20160601130952.GJ2381@scylladb.com>	2016-06-02 10:45:01 +03:00
Avi Kivity	b50cb3eca8	config: rename compact_on_idle compact_on_idle will lead users to thinking we're talking about sstable compaction, not log-structured-allocator compaction. Rename the variable to reduce the probability of confusion. Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>	2016-05-30 08:39:13 +03:00
Piotr Jastrzebski	136b8148d2	Use idle CPU to compact LSA memory Register an idle CPU handler that compacts a single segment every time there's nothing better to execute on CPU. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>	2016-05-26 12:43:53 +03:00
Tomasz Grabiec	f0c2b1d161	config: Fix typos Message-Id: <1464201938-4778-1-git-send-email-tgrabiec@scylladb.com>	2016-05-26 08:19:57 +03:00
Calle Wilund	8cdf4e37fb	schema_tables: Fix merge_keyspaces to handle alter keyspace Must keep "altered" alive into the call chain.	2016-05-10 14:32:51 +00:00
Pekka Enberg	f6da9bc92b	Merge "Additional mutations/queries related collectd metrics" from Vlad "This series introduces some additional metrics (mostly) in a storage_proxy and a database level that are meant to create a better picture of how data flows in the cluster. First of all where possible counters of each category (e.g. total writes in the storage proxy level) are split into the following categories: - operations performed on a local Node - operations performed on remote Nodes aggregated per DC In a storage_proxy level there are the following metrics that have this "split" nature (all on a sending side): - total writes (attempts/errors) - writes performed as a result of a Read Repair logic - total data reads (attempts/completed/errors) - total digest reads (attempts/completed/errors) - total mutations data reads (attempts/completed/errors) In a batchlog_manager: - writes performed as a result of a batchlog replay logic Thereby if for instance somebody wants to get an idea of how many writes the current Node performs due to user requested mutations only he/she has to take a counter of total writes and subtract the writes resulted by Read Repairs and batchlog replays. On a receiving side of a storage_proxy we add the two following counters: - total number of received mutations - total number of forwarded mutations (attempts/errors) In order to get a better picture of what is going on on a local Node we are adding two counters on a database level: - total number of writes - total number of reads Comparing these to total writes/reads in a storage_proxy may give a good idea if there is an excessive access to a local DB for example."	2016-04-21 15:58:45 +03:00
Vlad Zolotarov	4ef5b11e9b	batchlog_manager: add a counter for a total number of write attempts Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:29:21 +03:00
Duarte Nunes	bc90d6a730	udt: type_parser handles user defined types This patch ensures type_parser can handle user defined types. It also prefixes user_type_impl::make_name() with org.apache.cassandra.db.marshal.UserType. Fixes #631 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 18:07:07 +02:00
Duarte Nunes	809b45e160	udt: Add drop type statement Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 18:07:02 +02:00
Duarte Nunes	d1f215b743	udt: Merge user defined type mutations This patch implements the merge_types() function, allowing mutations to user defined types to be applied. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	d6d29f7c52	schema: Replace ad hoc func with indirect_equal_to Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	dd75fe8ec0	udt: Add mutations for user defined types This patch implements mutations for user defined types. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	c7b3a4b144	udt: Parse user types system table This patch loads and parses the user types system table during bootstrap. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Calle Wilund	a7e1af1c06	db::config: Add permissions cache entries/mark auth/perm as used	2016-04-19 11:49:05 +00:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00

1 2 3 4 5 ...

678 Commits