scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	586dbaa8d3	db: Replace virtual_reader_type with mutation_source_opt Virtual reader is a mutation_source.	2017-02-23 18:23:52 +01:00
Calle Wilund	ef26ab0e1b	db::system_keyspace: Find rpc_address by lookup	2017-02-06 09:45:37 +00:00
Duarte Nunes	40c684b5f5	database: Extract common create cf code This patch moves some duplicate code into the add_column_family_and_create_directory() function. It also saves some superfluous keyspace lookups and readies the code to be used by materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Glauber Costa	db7cc3cba8	system keyspace: write batchlog mutation in user memory Batchlog is a potentially memory-intensive table whose workload is driven by user needs, not system's. Move it to the user dirty memory manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-12-13 13:59:35 -05:00
Duarte Nunes	6a37d87c76	db: Delete size_estimates_recorder Now that access to the size_estimates system is virtualized, we no longer need the recorder. Fixes #1616 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:05 +00:00
Duarte Nunes	225648780d	size_estimates: Add virtual reader This patch add a virtual mutation_reader so that queries to the size_estimates system table are handled by the engine without needing to perform any IO. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:05 +00:00
Duarte Nunes	636287fdf2	system_keyspace: Build mutations for size estimates This patch adds a function to system_keyspace responsible for creating a mutation to a partition of the size_estimates system table from a set of range_estimates. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:04 +00:00
Duarte Nunes	18ddec245e	size_estimates: Store the token range as bytes This patch changes the range_estimates struct so that the tokens are represented as utf8 encoded bytes. This will make future patches require less conversions. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:14:21 +00:00
Duarte Nunes	e7a5162c1d	range_estimates: Add schema This will be used in future patches, when virtualizing the size_estimates system table. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 10:56:32 +00:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Duarte Nunes	e0a43a82c6	system_keyspace: Correctly deal with wrapped ranges This patch ensures we correctly deal with ranges that wrap around when querying the size_estimates system table. Ref #693 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>	2016-08-05 19:17:00 +03:00
Duarte Nunes	ecfa04da77	system_keyspace: Add query_size_estimates() function The query_size_estimates() function queries the size_estimates system table for a given keyspace and table, filtering out the token ranges according to the specified tokens. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-24 22:43:58 +00:00
Duarte Nunes	e16f3f2969	system_keyspace: Avoid pointers in range_estimates This patch makes range_estimates a proper struct, where tokens are represented as dht::tokens rather than dht::ring_position*. We also pass other arguments to update_ and clear_size_estimates by copy, since one will already be required. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-24 22:43:35 +00:00
Piotr Jastrzebski	636a4acfd0	Add flag to configure max size of a cached partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:47:20 +02:00
Vlad Zolotarov	baa6496816	service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Duarte Nunes	f8f61cf246	system_keyspace: Record and clear size estimates This patch implements functions that allow the size_estimates system table to be updated and cleared. The size_estimates table is updated per schema with a set of token ranges and the associated estimations of how many partitions there are and their mean size. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-18 23:58:31 +00:00
Glauber Costa	7169b727ea	move system tables to its own region In the spirit of what we are doing for the read semaphore, this patch moves system writes to its own dirty memory manager. Not only will it make sure that system tables will not be serialized by its own semaphore, but it will also put system tables in its own region group. Moving system tables to its own region group has the advantage that system requests won't be waiting during throttle behind a potentially big queue of user requests, since requests are tended to in FIFO order within the same region group. However, system tables being more controlled and predictable, we can actually go a step further and give them some extra reservation so they may not necessarily block even if under pressure (up to 10 MB more). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Avi Kivity	76cc6408cd	Merge "feature check for seed node" from Asias ""This series implemnts feature check for seed node.	2016-07-05 19:01:01 +03:00
Asias He	6f69963ef9	system_keyspace: Simplify load_host_ids implementation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	3f31be58b6	system_keyspace: Simplify load_tokens implemntation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	31df4e5316	system_keyspace: Introduce load_peer_features To get the peer features stored in the system.peers table.	2016-07-05 10:09:53 +08:00
Avi Kivity	9ac730dcc9	mutation_reader: make restricting_mutation_reader even more restricting While limiting the number of concurrently executing sstable readers reduces our memory load, the queued readers, although consuming a small amount of memory, can still grow without bounds. To limit the damage, add two limits on the queue: - a timeout, which is equal to the read timeout - a queue length limit, which is equal to 2% of the shard memory divided by an estimate of the queued request size (1kb) Together, these limits bound the amount of memory needed by queued disk requests in case the disk can't keep up. Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>	2016-06-29 15:17:35 +02:00
Avi Kivity	edeef03b34	db: restrict replica read concurrency Since reading mutations can consume a large amount of memory, which, moreover, is not predicatable at the time the read is initiated, restrict the number of reads to 100 per shard. This is more than enough to saturate the disk, and hopefully enough to prevent allocation failures. Restriction is applied in column_family::make_sstable_reader(), which is called either on a cache miss or if the cache is disabled. This allows cached reads to proceed without restriction, since their memory usage is supposedly low. Reads from the system keyspace use a separate semaphore, to prevent user reads from blocking system reads. Perhaps we should select the semaphore based on the source of the read rather than the keyspace, but for now using the keyspace is sufficient.	2016-06-27 17:17:56 +03:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	50bcfe569a	system_keyspace: Add supported_features into system.local table	2016-04-06 07:12:34 +08:00
Asias He	214c0f72b2	db: Add supported_features column in system.local and system.peers table	2016-04-06 07:12:34 +08:00
Asias He	abafec99a5	system_keyspace: Implement increment_and_get_generation	2016-02-29 16:31:42 +08:00
Tomasz Grabiec	6709c0ac15	cql_serialization_format: Make it CQL protocol version aware We want to serialize it as a single number, the CQL binary protocol version to which it corresponds, so it needs to be aware of the version number.	2016-02-15 17:05:55 +01:00
Calle Wilund	ce66acc771	system_keyspace: Always retain highest truncation time stamp Since the table is written from all shards, and we possibly might have conflicting time stamps, we define the trucated_at time as the highest time point. I.e. conservative.	2016-02-09 15:45:37 +00:00
Calle Wilund	1c213e1f38	system_keyspace: Use IDL types + better verification of truncation record Truncation records are not portable between us and Origin. We need to detect and ensure we neither try to use, and more to the point, don't crash because of data format error when loading, origin records from a migrated system. This problem was seen by Tzach when doing a migration from an origin setup. Updated record storage to use IDL-serialized types + added versioning and magic marking + odd-size-checking to ensure we load only correct data. The code will also deal with records from an older version of scylla.	2016-02-09 15:45:37 +00:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	04eb58159a	query: Add schema_version field to read_command	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	f58c2dec1e	schema: Make schema objects versioned The version needs to change value not only on structural changes but also temporal. This is needed for nodes to detect if the version they see was already synchronized with or not even if it has the same structure as the past versions. We also need to end up with the same version on all nodes when schema changes are commuted. For regular mutable schemas version will be calculated from underlying mutations when schema is announced. For static schemas of system keyspace it is calculated by hashing scylla version and column id, because we don't have mutations at the time of building the schema.	2016-01-08 21:10:26 +01:00
Asias He	4952042fbf	tests: Fix cql_test_env.cc Current service initialization is a total mess in cql_test_env. Start the service the same order as in main.cc. Fixes #715, #716 './test.py --mode release' passes.	2016-01-01 10:15:17 +08:00
Raphael S. Carvalho	433ed60ca3	db: add method to get compaction history This method is intended to return content of the system table COMPACTION_HISTORY as a vector of compaction_history_entry. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 14:19:04 -02:00
Raphael S. Carvalho	f3beacac28	db: add method to update the system table COMPACTION_HISTORY It's supposed to be called at the end of compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 13:47:10 -02:00
Asias He	e79c85964f	system_keyspace: Flush system.peers in remove_endpoint 1) Start node 1, node 2, node 3 2) Stop node 3 3) Start node 4 to replace node 3 4) Kill node 4 (removal of node 3 in system.peers is not flushed to disk) 5) Start node 4 (will load node 3's token and host_id info in bootup) This makes "Token .* changing ownership from 127.0.0.3 to 127.0.0.4" messages printed again in step 5) which are not expected, which fails the dtest FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress) ---------------------------------------------------------------------- Traceback (most recent call last): File "scylla-dtest/replace_address_test.py", line 220, in replace_first_boot_test self.assertEqual(len(movedTokensList), numNodes) AssertionError: 512 != 256	2015-12-09 12:30:52 +08:00
Asias He	ccbd801f40	storage_service: Fix decommissioned nodes are willing to rejoin the cluster if restarted Backport: CASSANDRA-8801 a53a6ce Decommissioned nodes will not rejoin the cluster. Tested with: topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test	2015-12-09 10:43:51 +08:00
Avi Kivity	47499dcf18	data_value: make conversion from bytes explicit Since bytes is a very generic value that is returned from many calls, it is easy to pass it by mistake to a function expecting a data_value, and to get a wrong result. It is impossible for the data_value constructor to know if the argument is a genuine bytes variable, a data_value of another type, but serialized, or some other serialized data type. To prevent misuse, make the data_value(bytes) constructor (and complementary data_value(optional<bytes>) explicit.	2015-11-13 17:12:29 +02:00
Tomasz Grabiec	3c4c83c66f	cql_test_env: Initialize system keyspace	2015-11-09 08:42:53 +08:00
Avi Kivity	2c3591cbd9	data_value de-any-fication We use boost::any to convert to and from database values (stored in serlialized form) and native C++ values. boost::any captures information about the data type (how to copy/move/delete etc.) and stores it inside the boost::any instance. We later retrieve the real value using boost::any_cast. However, data_value (which has a boost::any member) already has type information as a data_type instance. By teaching data_type intances about the corresponding native type, we can elimiante the use of boost::any. While boost::any is evil and eliminating it improves efficiency somewhat, the real goal is growing native type support in data_type. We will use that later to store native types in the cache, enabling O(log n) access to collections, O(1) access to tuples, and more efficient large blob support.	2015-10-30 17:38:51 +01:00
Vlad Zolotarov	d8de1099eb	message::messaging_service: introduce _preferred_ip_cache This map will contain the (internal) IPs corresponding to specific Nodes. The mapping is also stored in the system.peers table. So, instead of always connecting to external IP messaging_service::get_rpc_client() will query _preferred_ip_cache and only if there is no entry for a given Node will connect to the external IP. We will call for init_local_preferred_ip_cache() at the end of system table init. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Improved the _preferred_ip_cache description. - Code styling issues. New in v3: - Make get_internal_ip() public. - get_rpc_client(): return a get_preferred_ip() usage dropped in v2 by mistake during rebase.	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	fd811dd707	db::system_keyspace: added get_preferred_ips() get_preferred_ips() returns all preferred_ip's stored in system.peers table. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Get rid of extra std::move().	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	f2e1be0fc1	db::system_keyspace::update_preferred_ip(): use net::ipv4_address as a preferred_ip value Fixes issue #481 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-10-26 14:09:26 +02:00
Calle Wilund	6b0ab79ecb	system_keyspace: Keep per-shard truncation records Fixes #423 * CF ID now maps to a truncation record comprised of a set of per-shard RP:s and a high-mark timestamp * Retrieving RP:s are done in "bulk" * Truncation time is calculated as max of all shards. This version of the patch will accept "old" truncation data, though the result of applying it will most likely not be correct (just one shard) Record is still kept as a blob, "new" format is indicated by record size.	2015-10-07 08:59:52 +02:00
Calle Wilund	b3c95ce42d	system_keyspace: Change truncation record method to use context qp Align with rest of file (for better or worse). This allows calls from entity without query_processor handy (i.e. storage_proxy). Added "minimal" setup method for the "global" state, to facilitate tests. Doing a full setup either in cql_test_env or after it is created breaks badly. (Not sure why). So quick workaround. Updated the current two users (batchlog_manager and commitlog_replayer) callsites to conform.	2015-09-30 09:09:41 +02:00

1 2 3

146 Commits