scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Glauber Costa	139a2d14a1	disable defragment-memory-on-idle-by-default It's been linked with various performance issues, either by causing them or making them worse. One example is #1634, and also recently I have investigated continuous performance degradation that was also linked to defrag on idle activity. Until we can figure out how to reduce its impact, we should disable it. Signed-off-by: Glauber Costa <glauber@glauber.scylladb> Message-Id: <20170627201109.10775-1-glauber@scylladb.com> (cherry picked from commit `f3742d1e38`)	2017-07-10 19:25:12 +03:00
Tomasz Grabiec	47b1e39410	commitlog: Discard active but unused segments on shutdown So that they are not left on disk even though we did a clean shutdown. First part of the fix is to ensure that closed segments are recognized as not allocating (_closed flag). Not doing this prevents them from being collected by discard_unused_segments(). Second part is to actually call discard_unused_segments() on shutdown after all segments were shut down, so that those whose position are cleared can be removed. Fixes #2550. Message-Id: <1499358825-17855-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `6555a2f50b`)	2017-07-10 12:40:43 +03:00
Duarte Nunes	60af7eab10	udt: Don't check a type is unused after applying the schema mutations This patch is based on `6c8b5fc`. It moves the check whether a dropped type is still used by other types or tables from schema_tables to the drop_type_statement, as delaying this check to after applying the mutations can leave the keyspace in a broken state. Fixes #2490 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1497466736-28841-1-git-send-email-duarte@scylladb.com>	2017-06-15 10:35:01 +03:00
Paweł Dziepak	7bb41b50f9	commitlog: avoid copying column_mapping It is safe to copy column_mapping accros shards. Such guarantee comes at the cost of performance. This patch makes commitlog_entry_writer use IDL generated writer to serialise commitlog_entry so that column_mapping is not copied. This also simplifies commitlog_entry itself. Performance difference tested with: perf_simple_query -c4 --write --duration 60 (medians) before after diff write 79434.35 89247.54 +12.3% (cherry picked from commit `374c8a56ac`) Also: Fixes #2468.	2017-06-11 15:44:20 +03:00
Paweł Dziepak	98d782cfe1	db: make virtual dirty soft limit configurable Message-Id: <20170428150005.28454-1-pdziepak@scylladb.com> (cherry picked from commit `24f4dcf9e4`)	2017-04-30 19:17:55 +03:00
Calle Wilund	9b26a57288	commitlog/replayer: Bugfix: minimum rp broken, and cl reader offset too The previous fix removed the additional insertion of "min rp" per source shard based on whether we had processed existing CF:s or not (i.e. if a CF does not exist as sstable at all, we must tag it as zero-rp, and make whole shard for it start at same zero. This is bad in itself, because it can cause data loss. It does not cause crashing however. But it did uncover another, old old lingering bug, namely the commitlog reader initiating its stream wrongly when reading from an actual offset (i.e. not processing the whole file). We opened the file stream from the file offset, then tried to read the file header and magic number from there -> boom, error. Also, rp-to-file mapping was potentially suboptimal due to using bucket iterator instead of actual range. I.e. three fixes: * Reinstate min position guarding for unencoutered CF:s * Fix stream creating in CL reader * Fix segment map iterator use. v2: * Fix typo Message-Id: <1490611637-12220-1-git-send-email-calle@scylladb.com> (cherry picked from commit `b12b65db92`)	2017-03-28 10:35:04 +02:00
Calle Wilund	3cc03f88fd	commitlog_replayer: Do proper const-loopup of min positions for shards Fixes #2173 Per-shard min positions can be unset if we never collected any sstable/truncation info for it, yet replay segments of that id. Wrap the lookups to handle "missing data -> default", which should have been there in the first place. Message-Id: <1490185101-12482-1-git-send-email-calle@scylladb.com> (cherry picked from commit `c3a510a08d`)	2017-03-22 17:57:30 +02:00
Calle Wilund	698a4e62d9	commitlog_replayer: Make replay parallel per shard Fixes #2098 Replay previously did all segments in parallel on shard 0, which caused heavy memory load. To reduce this and spread footprint across shards, instead do X segments per shard, sequential per shard. v2: * Fixed whitespace errors Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com> (cherry picked from commit `078589c508`)	2017-03-15 13:07:45 +02:00
Paweł Dziepak	7f17424a4e	Merge "Avoid loosing changes to keyspace parameters of system_auth and tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications (cherry picked from commit `6db6d25f66`)	2017-03-08 16:31:41 +02:00
Paweł Dziepak	9f1ebd4f7c	idl/mutation: add counter serialisation logic	2017-02-02 10:35:14 +00:00
Amnon Heiman	45b6070832	Merge seastar upstream * seastar 397685c...c1dbd89 (13): > lowres_clock: drop cache-line alignment for _timer > net/packet: add missing include > Merge "Adding histogram and description support" from Amnon > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&' > Set the option '--server' of tests/tcp_sctp_client to be required > core/memory: Remove superfluous assignment > core/memory: Remove dead code > core/reactor: Use logger instead of cerr > fix inverted logic in overprovision parameter > rpc: fix timeout checking condition > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier Includes treewide change to accomodate rpc changing its timeout clock to lowres_clock. Includes fixup from Amnon: collectd api should use the metrics getters As part of a preperation of the change in the metrics layer, this change the way the collectd api uses the metrics value to use the getters instead of calling the member directly. This will be important when the internal implementation will changed from union to variant. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>	2017-02-01 14:39:08 +02:00
Tomasz Grabiec	634761dbba	commitlog: Fix default limit for size on disk The per-node limit will be total memory divided by number of shards instead of just total memory. For example, when Scylla is started with -c16 -m16G, the commit log will induce flushes on given shard when unflushed data exceeds on that shard 62MB instead of 1GB. Fixes #2046. Message-Id: <1485874534-10939-1-git-send-email-tgrabiec@scylladb.com>	2017-01-31 17:12:59 +02:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Tomasz Grabiec	50e3e3af08	db: Add missing include Message-Id: <1484656119-8386-3-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Tomasz Grabiec	ea9ab36ad5	db: Move operator<<() definition to .cc Message-Id: <1484656119-8386-2-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:43 +02:00
Avi Kivity	c314047b6c	config: disable new sharding algorithm It still has problems: - while resharding a very large leveled compaction strategy table, a huge amount of tiny sstables are generated, overwhelming the file descriptor limits - there is a large impact on read latency while resharding is going on (cherry picked from commit `cf27d44412`) (forward-ported from branch-1.6)	2017-01-15 10:48:53 +02:00
Vlad Zolotarov	dcdd98ccc1	db::commitlog::commitlog: move collectd counters registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Vlad Zolotarov	a9f6e5f8da	db::batchlog_manager: move collectd registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Pekka Enberg	3d0217ec43	db/schema_tables: Fix system keyspace table list Commit `f0c28e1` ("db/schema_tables: Add schema_functions and schema_aggregates tables") forgot to add the newly added tables to the db::schema_tables::ALL list, which is used for authorization checks, for example. Fixes the following auth_test.py dtest failures: ('Unable to connect to any servers', {'127.0.0.1': Unauthorized('Error from server: code=2100 [Unauthorized] message="User cathy has no SELECT permission on <table system.schema_functions> or any of its parents"',)}) Message-Id: <1484045277-4997-1-git-send-email-penberg@scylladb.com>	2017-01-10 13:55:04 +01:00
Pekka Enberg	f0c28e1b2d	db/schema_tables: Add schema_functions and schema_aggregates tables The 3.0.3 Java driver, for example, search for the tables and fails when we advertise Cassandra 2.2 version from Scylla.	2017-01-09 10:42:21 +02:00
Amnon Heiman	70b2a1bfd4	Set the prometheus prefix to scylla This patch make the prometheus prefix configurable and set the default value to scylla. Fixes #1964 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1482671970-21487-1-git-send-email-amnon@scylladb.com>	2016-12-25 15:21:53 +02:00
Avi Kivity	3989e4ed15	Revert "config, dht: reduce default msb ignore bits to 4" This reverts commit `b81a57e8eb`. With exponential range scanning, we should now be able to survive msb ignore bits of 12, which allows better sharding on large clusters.	2016-12-20 19:41:05 +02:00
Duarte Nunes	124802e196	cql3: Add function to build view's select statement This patch adds an utility function that creates a raw select statement from a set of columns and a where clause. It is intended to be used to create the prepared select statement used by the view class. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	5bd74abee8	create_view_statement: Implement check_access This patch implements check_access according to Cassandra's implementation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	8ce21a9c01	schema_tables: Make drop view mutations Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	61a5a74ea2	schema_tables: Make update view mutations Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	2098c336d9	schema_tables: Make create view mutations This patch builds the mutations to announce a new view. Aside from including the view schema, we include the base table mutations so that a node is resilient against receiving create view mutations before the base table create mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	19a76a82e8	frozen_schema: Support view schemas This patch allows a view schema to be frozen. To unfreeze such a schema, we add an is_view attribute to the schema idl. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	c11eb30225	schema_tables: Replace add_table_to_schema_mutation This patch replaces the add_table_to_schema_mutation() function with add_table_or_view_to_schema_mutation(). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	04b93ba803	schema_tables: Make view mutations This patch adds functions that translate a view schema to the corresponding mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	fe632e8ba5	schema_tables: Factor out duplicate code This patch factors out duplicate code between merge_tables() and merge_views(). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	3fd79bb6d6	schema_tables: Merge views for schema merging Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	06ab61a570	schema_tables: Extract update_column_family This patch extracts update_column_family from schema_tables into database so it can be used when adding materialized views, in future patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	40c684b5f5	database: Extract common create cf code This patch moves some duplicate code into the add_column_family_and_create_directory() function. It also saves some superfluous keyspace lookups and readies the code to be used by materialized views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	42242273f6	schema_tables: Create views from mutations This patch enables views to be created from their low-level, mutation-based representation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	888a8923c7	read_table_mutations: Support other schemas This patch changes read_table_mutations() so that it can now read schemas from other tables besides the column families schema table. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	b9cf25c4dd	schema_tables: Add views schema table This patch adds the views schema table, containing the definition of views in a keyspace. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	7818339791	materialized views: Add view class This patch adds the view class, which will contains functions related to populating a view, either from the base table's write path or from the view building mechanism which copies over already existing data in the base table. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	a61ff53150	Merge "rework flush criteria" from Glauber "The current criteria for memtable flush is not being respected. The problem is demonstrated to happen when the dirty memory group is over limit, and so is the system table extra allowance. In that situation, both the normal region and the system table region will be under pressure and try to flush. More specifically, because the normal region inherits from the system region, if the normal region is under pressure (over the soft limit threshold), the system region will certainly be as well, even though it has an extra allowance. This is because after virtual dirty, we start blocking when we reach half the region, but memory itself can grow up to 100 % of the region. So the total amount of memory used will be certainly bigger than the system pressure threshold, which is now 50 % plus the allowance. To fix that, this patch reworks the flush logic so that the regions are not dependent on each other. Fixes #1918" * 'flush-criteria-v6' of github.com:glommer/scylla: config: get rid of memtable_total_space database: rework dirty memory hierarchy system keyspace: write batchlog mutation in user memory database: remove flush_token database: abstract pressure condition notification database: encapsulate semaphore_units into a flush_permit database: remove friendship declaration database: simplify flush_one database: make memtable_list aware in cases it can't flush	2016-12-14 11:24:10 +02:00
Glauber Costa	2aa6514667	config: get rid of memtable_total_space Those values are now statically set. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-12-13 17:05:12 -05:00
Glauber Costa	db7cc3cba8	system keyspace: write batchlog mutation in user memory Batchlog is a potentially memory-intensive table whose workload is driven by user needs, not system's. Move it to the user dirty memory manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-12-13 13:59:35 -05:00
Tomasz Grabiec	059a1a4f22	db: Fix commitlog replay to not drop cell mutations with older schema column_mapping is not safe to access across shards, because data_type is not safe to access. One of the manifestation of this is that abstract_type::is_value_compatible_with() always fails if the two types belong to different shards. During replay, column_mapping lives on the replaying shard, and is used by converting_mutation_partition_applier against the schema on the target shard. Since types in the mapping will be considered incompatible with types in the schema, all cells will be dropped. Fix by using column_mapping in a safe way, by copying it to the target shard if necessary. Each shard maintains its own cache of column mappings. Fixes #1924. Message-Id: <1481310463-13868-1-git-send-email-tgrabiec@scylladb.com>	2016-12-13 12:19:32 +02:00
Glauber Costa	9b5e6d6bd8	commitlog: correctly report requests blocked The semaphore future may be unavailable for many reasons. Specifically, if the task quota is depleted right between sem.wait() and the .then() clause in get_units() the resulting future won't be available. That is particularly visible if we decrease the task quota, since those events will be more frequent: we can in those cases clearly see this counter going up, even though there aren't more requests pending than usual. This patch improves the situation by replacing that check. We now verify whether or not there are waiters in the semaphore. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com>	2016-12-09 15:02:26 +02:00
Tomasz Grabiec	f7197dabf8	commitlog: Fix replay to not delete dirty segments The problem is that replay will unlink any segments which were on disk at the time the replay starts. However, some of those segments may have been created by current node since the boot. If a segment is part of reserve for example, it will be unlinked by replay, but we will still use that segment to log mutations. Those mutations will not be visible to replay after a crash though. The fix is to record preexisting segents before any new segments will have a chance to be created and use that as the replay list. Introduced in `abe7358767`. dtest failure: commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>	2016-12-07 15:54:47 +02:00
Asias He	00d7a35949	utils: Put crc32 under utils namespace It conflicts with crc in zlib Message-Id: <1480918984-4117-2-git-send-email-asias@scylladb.com>	2016-12-05 11:48:29 +02:00
Glauber Costa	99a5a77234	prevent commitlog replay position reordering during reserve refill When requests hit the commitlog, each of them will be assigned a replay position, which we expect to be ordered. If reorders happen, the request will be discarded and re-applied. Although this is supposed to be rare, it does increase our latencies, specially when big requests are involved. Processing big requests is expensive and if we have to do it twice that adds to the cost. The commitlog is supposed to issue replay positions in order, and it coudl be that the code that adds them to the memtables will reorder them. However, there is one instance in which the commitlog will not keep its side of the bargain. That happens when the reserve is exhausted, and we are allocating a segment directly at the same time the reserve is being replenished. The following sequence of events with its deferring points will ilustrate it: on_timer: return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) { At this point, the segment id is already allocated. new_segment(): if (_reserve_segments.empty()) { [ ... ] return allocate_segment(true).then ... At this point, we have a new segment that has an id that is higher than the previous id allocated. Then we resume the execution from the deferring point in on_timer(): i = _reserve_segments.emplace(i, std::move(s)); The next time we need to allocate a segment, we'll pick it from the reserve. But the segment in the reserve has an id that is lower than the id that we have already used. Reorders are bad, but this one is particularly bad: because the reorder happens with the segment id side of the replay position, that means that every request that falls into that segment will have to be reinserted. This bug can be a bit tricky to reproduce. To make it more common, we can artificially add a sleep() fiber after the allocate_segment(false) in on_timer(). If we do that, we'll see a sea of reinsertions going on in the logs (if dblog is set to debug). Applying this patch (keeping the sleep) will make them all disappear. We do this by rewriting the reserve logic, so that the segments always come from the reserve. If we draw from a single pool all the time, there is no chance of reordering happening. To make that more amenable, we'll have the reserve filler always running in the background and take it out of the timer code. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com>	2016-12-01 13:20:46 +01:00

1 2 3 4 5 ...

779 Commits