scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Pekka Enberg	a772938e73	transport/server: Round-robin CQL request load balancing Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-27 13:24:58 +02:00
Avi Kivity	0754e9a34d	Merge "Even more commitlog fixes" from Calle "Fixes for commitlog (debug) test failures related to shutdowns. Note that most the fixes here are only really related to the tests failing, not really real scylla runs. However, at some point we'll have real shutdown in scylla as well (not just hard exit), at which point this becomes more relevant there as well. Main issue was post-flush continuation chains for stats update remaining unexecuted, due to task reordering, once the commitlog object itself had been destroyed. This could have been handled by just making the stats object a shared pointer, but in general it seems more prudent to enforce having all tasks completed after shutdown. * Change commitlog shutdown to use gate+wait for all outstanding ops (flush, write, timer). Thus we can ensure everything is finished when returning from "shutdown". * Fix bug with "commitlog::clear" (test method) not doing the intended deed * Most importantly, fix the tests themselves, cleaning up old crud, and fixing invalid assumptions (CL behaviour changed quite a bit since tests were created), and remove races. Disclaimer: I've _never_ managed to reproduce the debug tests failing like in jenkins locally (though I managed to provoke other failures), but at least jenkins runs with this series have been clean. Knock knock."	2015-10-27 12:16:20 +02:00
Calle Wilund	5299cece4c	commitlog: Make "shutdown" do flushing + hard sync of pending ops * Do close + fsync on all segments * Make sure all pending cycle/sync ops are guarded with a gate, and explicitly wait for this gate on shutdown to make sure we don't leave hanging flushes in the task queue. * Fix bug where "commitlog::clear" did not in fact shut down the CL, due to "_shutdown" being already set. Note: This is (at least currently) not an issue for anything else than tests, since we don't shutdown the normal server "properly", i.e. the CL itself will not go away, and hanging tasks are ok, as long as the sync-all is done (which it was previously). But, to make tests predictable, and future-proof the CL, this is better.	2015-10-26 14:50:54 +01:00
Vlad Zolotarov	d8de1099eb	message::messaging_service: introduce _preferred_ip_cache This map will contain the (internal) IPs corresponding to specific Nodes. The mapping is also stored in the system.peers table. So, instead of always connecting to external IP messaging_service::get_rpc_client() will query _preferred_ip_cache and only if there is no entry for a given Node will connect to the external IP. We will call for init_local_preferred_ip_cache() at the end of system table init. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Improved the _preferred_ip_cache description. - Code styling issues. New in v3: - Make get_internal_ip() public. - get_rpc_client(): return a get_preferred_ip() usage dropped in v2 by mistake during rebase.	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	fd811dd707	db::system_keyspace: added get_preferred_ips() get_preferred_ips() returns all preferred_ip's stored in system.peers table. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Get rid of extra std::move().	2015-10-26 14:09:26 +02:00
Vlad Zolotarov	f2e1be0fc1	db::system_keyspace::update_preferred_ip(): use net::ipv4_address as a preferred_ip value Fixes issue #481 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-10-26 14:09:26 +02:00
Calle Wilund	05de462fa9	commitlog: Make flush/segment delete slightly mode defensive + test tolerant Fix for (mainly) test failures (use-after free) I.e. test case test_commitlog_delete_when_over_disk_limit causes use-after free because test shuts down before a pending flush is done, and the segment manager is actually gone -> crash writing stats. Now, we could make the stats a shared pointer, but we should never allow an operation to outlive the segment_manager. In normal op, we _almost_ guarantee this with the shutdown() call, but technically, we could have a flush continuation trailing somewhere. * Make sure we never delete segments from segment_manager until they are fully flushed * Make test disposal method "clear" be more defensive in flushing and clearing out segments	2015-10-22 15:19:24 +03:00
Calle Wilund	786d66cacf	commitlog: Fix use-after-free Remove "finally". Just use a then_wrapped. Which it was originally, before "handle_exception" was introduced to seastar. Oh, the irony...	2015-10-20 09:56:40 +03:00
Tomasz Grabiec	19d7d30e67	Replace references to 'urchin' with 'scylla'	2015-10-19 11:08:05 +03:00
Raphael S. Carvalho	a21af32eed	db: do not ignore compaction strategy class When building the in-memory schema for a column family, we were ignoring compaction strategy class because of a bug in the existing code. Example: suppose that you create a column family with leveled compaction strategy. This option would be ignored and the default strategy (size-tiered) would be used instead. Found this problem while working on leveled compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-10-18 11:06:37 +03:00
Glauber Costa	e99e418238	schema_tables: make sure CF directory exists upon creation In Cassandra, when you create a new column family, a directory for it immediately appears under the KS directory. In the past, we have made a decision to delay that creation until the first SSTable is created, which works well in general. There is a problem, however, for backup restoration: the standard procedure to call loadNewSSTables is to do that in an empty directory. But the directory simply won't be there until we create the first SSTable: bummer! In the current incarnation of the code in schema_tables.cc, there is already some code that runs on CPU0 only. That is a perfect place for the directory creation. So let's do it. After this patch, a directory for the CF appears right after the CF creation. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-17 13:08:07 +02:00
Avi Kivity	849464670c	commitlog: make new segments more xfs-friendly xfs doesn't like writes beyond eof (exactly at eof is fine), and due to continuation reordering, we sometimes do that. Fix by pre-truncating the segment to its maximum size.	2015-10-14 17:32:59 +03:00
Calle Wilund	206acd8b5b	commitlog: Make reader handle pre-allocated files Silently ignore, and assume eof if reading zeroed file or chunk header data Reading entries already deal with this.	2015-10-14 17:32:23 +03:00
Calle Wilund	2729d5dd71	commitlog: ensure file size remains <= max_size Re-check file size overflow after each cycle() call (new buffer), otherwise we could write more, in the case we are storing a mutation larger than current buffer size (current pos + sizeof(mut) < max_size, but after cycle required by sizeof(mut) > buf_remain, the former might not be true anymore.	2015-10-14 17:32:22 +03:00
Avi Kivity	e252475e67	Merge "locator: Adding EC2Snitch" from Vlad "This series adds EC2Snich. Since both GossipingPropertyFileSnitch and EC2SnitchXXX snitches family are using the same property file it was logical to share the corresponding code. Most of this series does just that... "	2015-10-11 14:55:26 +03:00
Glauber Costa	b2fef14ada	do not calculate truncation time independently Currently, we are calculating truncated_at during truncate() independently for each shard. It will work if we're lucky, but it is fairly easy to trigger cases in which each shard will end up with a slightly different time. The main problem here, is that this time is used as the snapshot name when auto snapshots are enabled. Previous to my last fixes, this would just generate two separate directories in this case, which is wrong but not severe. But after the fix, this means that both shards will wait for one another to synchronize and this will hang the database. Fix this by making sure that the truncation time is calculated before invoke_on_all in all needed places. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-09 17:17:11 +03:00
Vlad Zolotarov	de6cf8db51	db::config: add get_conf_dir() This function returns the directory containing the configuration files. It takes into an account the evironment variables as follows: - If SCYLLA_CONF is defines - this is the directory - else if SCYLLA_HOME is defines, then $SCYLLA_HOME/conf is the directory - else "conf" is a directory, namely the configuration files should be looked at ./conf Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Updated get_conf_dir() description.	2015-10-08 20:57:11 +03:00
Pekka Enberg	95012793e5	db/schema_tables: Wire up drop keyspace notifications Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-08 13:10:48 +02:00
Calle Wilund	42c086a5cd	batchlog_manager: Fixup includes + exception handling * Fix exception handling in batch loop (report + still re-arm) * Cleanup seastar include reference style	2015-10-07 17:06:34 +03:00
Calle Wilund	a4c14d3d1d	batchlog_manager: Add hint of which cpu timer callback is running on	2015-10-07 14:57:55 +02:00
Calle Wilund	b46496da34	batchlog_manager: Rename logger * More useful/referrable on command line (--log) Matches class name (though not origin)	2015-10-07 14:30:09 +02:00
Calle Wilund	6f94a3bdad	batchlog_manager: Use gate instead of semaphore Since that exists now.	2015-10-07 14:30:09 +02:00
Calle Wilund	874da0eb67	batchlog_manager: Run timer loop on only one shard Since replay is a "node global" operation, we should not attempt to do it in parallel on each shard. It will just overlap/interfere. Could just run this on cpu 0 or but since this _could_ be a lengty operation, each timer callback is round-robined shards just in case...	2015-10-07 14:30:09 +02:00
Calle Wilund	246e8e24f2	replay_position: Make <= comparator simpler and cleaner	2015-10-07 14:34:22 +03:00
Calle Wilund	a66c22f1ec	commitlog_replayer: Acquire truncation RP:s per replayed shard I.e. get them in bulk and fill in for all shards	2015-10-07 09:00:22 +02:00
Calle Wilund	17bd18b59c	commitlog_replayer: Add logging message for exceptions in multi-file recover	2015-10-07 08:59:54 +02:00
Calle Wilund	3f1fa77979	commitlog_replayer: Fix broken comparison A commitlog entry should be ignored if its position is <= highest recorded position, not <.	2015-10-07 08:59:53 +02:00
Calle Wilund	271eb3ba02	replay_position: Add <= comparator	2015-10-07 08:59:53 +02:00
Calle Wilund	6b0ab79ecb	system_keyspace: Keep per-shard truncation records Fixes #423 * CF ID now maps to a truncation record comprised of a set of per-shard RP:s and a high-mark timestamp * Retrieving RP:s are done in "bulk" * Truncation time is calculated as max of all shards. This version of the patch will accept "old" truncation data, though the result of applying it will most likely not be correct (just one shard) Record is still kept as a blob, "new" format is indicated by record size.	2015-10-07 08:59:52 +02:00
Calle Wilund	199b72c6f3	commitlog: fix reader "offset" handling broken + ensure exceptions propagates Must ensure we find a chunk/entry boundary still even when run with a start offset, since file navigation in chunk based. Was not observed as broken previously because 1.) We did not run with offsets 2.) The exception never reached caller. Also make the reader silently ignore empty files.	2015-10-07 08:54:49 +02:00
Calle Wilund	024041c752	commitlog: make log message slightly more informative/correct	2015-10-07 08:54:49 +02:00
Pekka Enberg	5878f62b18	db/schema_tables: Clean up indentation Almost the whole file is (accidentally) indented four spaces to the right for no reason. Fix that up because it's annoying as hell. Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-06 17:09:27 +02:00
Pekka Enberg	1f9e769dd3	db/schema_tables: Remove obsolete ifdef'd code Remove ifdef'd code that we won't be converting to C++ because of design differences. Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-06 17:09:27 +02:00
Pekka Enberg	6e304cd58c	db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces When we query schema keyspaces after we have applied a delete mutation, the dropped keyspace does not exist in the "after" result set. Fix the merge_keyspaces() algorithm to take that into account. Makes merge_keyspaces() really call to database::drop_keyspace() when a keyspace is dropped. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-06 14:53:35 +03:00
Pekka Enberg	5d9d1e28cb	db/schema_tables: Implement make_drop_keyspace_mutations() Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-06 14:53:35 +03:00
Pekka Enberg	633279415d	db/schema_tables: Fix merge_tables() to actually drop tables When we query schema tables after we have applied a delete mutation, the dropped table does not exist in the "after" result set. Fix the merge_tables() algorithm to take that into account. Makes merge_tables() really call to database::drop_column_family() when a table is dropped. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-06 11:28:55 +03:00
Pekka Enberg	82d20dba65	db/schema_tables: Implement make_drop_table_mutations() Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-10-06 11:28:55 +03:00
Pekka Enberg	b89b70daa8	db/schema_tables: Wire up drop column notifications Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-06 11:28:55 +03:00
Pekka Enberg	0651ab6901	database: Futurize drop_column_family() function Futurize drop_column_family() so that we can call truncate() from it. Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-06 11:28:55 +03:00
Pekka Enberg	b74a9d99d5	db/schema_tables: Fix UTF-8 serialization Use the utf8_type to serialize strings instead of using to_bytes(). Signed-off-by: Pekka Enberg <penberg@scylladb.com>	2015-10-05 09:26:15 +02:00
Calle Wilund	7856d7fe02	config: Change "auto_snapshot" to "used"	2015-09-30 09:09:42 +02:00
Calle Wilund	b3c95ce42d	system_keyspace: Change truncation record method to use context qp Align with rest of file (for better or worse). This allows calls from entity without query_processor handy (i.e. storage_proxy). Added "minimal" setup method for the "global" state, to facilitate tests. Doing a full setup either in cql_test_env or after it is created breaks badly. (Not sure why). So quick workaround. Updated the current two users (batchlog_manager and commitlog_replayer) callsites to conform.	2015-09-30 09:09:41 +02:00
Calle Wilund	3abd8b38b6	query_context: Expose query_processor (local)	2015-09-30 09:09:41 +02:00
Avi Kivity	0ec0e32014	Merge "ommitlog: preallocate segments" from Calle "Modified version of the initial patch (which was reverted), further reducing the possible delay states in CL allocation and segment management."	2015-09-29 17:02:54 +03:00
Pekka Enberg	f43f0d6f04	keys: Add compound_wrapper::from_singular() Clean up code by adding a from_singular() helper function to compound wrapper and use it in.	2015-09-28 16:29:44 +02:00
Calle Wilund	4941d91063	Commitlog: add some more verbosity	2015-09-22 12:57:33 +02:00
Paweł Dziepak	34e66e60c1	main: disable thrift by default Fixes #205. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-22 09:48:44 +02:00
Calle Wilund	a10745cf0e	Commitlog: Delay timer by period/ncpus for each cpu To avoid having all shards doing sync at the same time.	2015-09-21 13:30:35 +02:00
Calle Wilund	dcabf8c1d2	Commitlog: Pre-allocate "reserve" segments Refs #356 Pre-allocates N segments from timer task. N is "adaptive" in that it is increased (to a max) every time segement acquisition is forced to allocate a new instead of picking from pre-alloc (reserve) list. The idea is that it is easier to adapt how many segments we consume per timer quanta than the timer quanta itself. Also does disk pressure check and flush from timer task now. Note that the check is still only done max once every new segment. Some logging cleanup/betterment also to make behaviour easier to trace. Reserve segments start out at zero length, and are still deleted when finished. This is because otherwise we'd still have to clear the file to be able to properly parse it later (given that is can be a "half" file due to power fail etc). This might need revisiting as well. With this patch, there should be no case (except flush starvation) where "add_mutation" actually waits for a (potentially) blocking op (disk). Note that since the amount of reserve is increased as needed, there will be occasional cases where a new segment is created in the alloc path until the system finds equilebrium. But this should only be during a breif warmup. v2: Fixed timestamp not being reset on reserve acquire	2015-09-21 13:04:39 +02:00
Pekka Enberg	6cef7d8270	db/schema_tables: Fix calculate_schema_digest() map_reduce() can run the reducer out-of-order which breaks the MD5 hash. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com> Fixes #357. [tgrabiec]	2015-09-21 11:51:17 +02:00

... 89 90 91 92 93 ...

4972 Commits