scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-19 16:15:07 +00:00

Author	SHA1	Message	Date
Asias He	af579a055b	gossip: Get rid of the gms::get_local_failure_detector static object Store the failure_detector object inside gossiper object. - No more the global object sharded<failure_detector> - No need to initialize sharded<failure_detector> manually which simplifies the code in tests/cql_test_env.cc and init.cc.	2019-03-22 09:08:51 +08:00
Avi Kivity	a9cf07369f	Merge "Add local indexes" from Piotr " This series adds support for local indexing, i.e. when the index table resides on the same partition as base data. It addresses the performance issue of having an indexed query that also specifies a partition key - index will be queried locally. " * 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits) tests: add cases for local index prefix optimization tests: add create/drop local index test case tests: add non-standard names cases to local index tests tests: add multi pk case for local index tests tests: add test for malformed local index definitions tests: add local index paging test tests: add local indexing test cql3: add CREATE INDEX syntax for local indexes cql3: use serialization function to create index target string index: add serialization function for index targets index: use proper local index target when adding index index: add parsing target column name from local index targets db: add checking for local index in schema tables index: add checking if serialized target implies local index index: enable parsing multi-key targets index: move target parser code to .cc file json: add non-throwing overload for to_json_value cql3: add checking for local indexes in has_supporting_index() cql3: move finding index restrictions to prepare stage cql3: add picking an index by score ...	2019-03-21 12:46:00 -03:00
Nadav Har'El	7c874057f5	materialized_views: propagate "view virtual columns" between nodes db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed to list the same schema tables - the former is the list of their names, and the latter is the list of their schemas. This code duplication makes it easy to forget to update one of them, and indeed recently the new "view_virtual_columns" was added to all_tables() but not to ALL. What this patch does is to make ALL a function instead of constant vector. The newly named all_table_names() function uses all_tables() so the list of schema tables only appears once. So that nobody worries about the performance impact, all_table_names() caches the list in a per-thread vector that is only prepared once per thread. Because after this patch all_table_names() has the "view_virtual_columns" that was previously missing, this patch also fixes #4339, which was about virtual columns in materialized views not being propagated to other nodes. Unfortunately, to test the fix for #4339 we need a test with multiple nodes, so we cannot test it here in a unit test, and will instead use the dtest framework, in a separate patch. Fixes #4339 Branches: 3.0 Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320063437.32731-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Piotr Sarna	e0d7807eed	db: add checking for local index in schema tables Based on which targets the index has, it will be either local or global - local indexes have their full base partition key embedded in their targets.	2019-03-20 10:20:24 +01:00
Piotr Sarna	90d47ca183	schema: add is_local_index cached value to index metadata In order to quickly distinguish global indexes from local ones, a cached boolean value is introduced.	2019-03-20 09:51:46 +01:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola	63251b66c1	db: Record large cells Fixes #4234. Large cells are now recorded in system.large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	d17083b483	Create a system.large_cells table This is analogous to the system.large_rows table, but holds individual cells, so it also needs the column name. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	8b4ae95168	large_data_handler: Run large data recording in parallel With this changes the futures returned by large_data_handler will not normally wait for entries to be written to system.large_rows or system.large_partitions. We use a semaphore to bound how behind system.large_* table updates can get. This should avoid delaying sstables writes in the common case, which is more relevant once we warn of large cells since the the default threshold will be just 1MB. Note that there is no ordering between the various maybe_record_* and maybe_delete_large_data_entries requests. This means that we can end up with a stale entry that is only removed once the TTL expires. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	54b856e5e4	large_data_handler: propagate a future out of stop() stop() will close a semaphore in a followup patch, so it needs to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	989ab33507	large_data_handler: Remove const from a few functions These will use a member semaphore variable in a followup patch, so they cannot be const. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	5fcb3ff2d7	db: don't use _stopped directly This gives flexibility in how it is implemented. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	a17a936882	large_data_handler: assert it is not used after stop() This should have been changed in the patch db: stop the commit log after the tables during shutdown But unfortunately I missed it then. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	f3089bf3d1	db: refactor a try_record helper We had almost identical error handling for large_partitions and large_rows. Refactor in preparation for large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola	d7f263d334	db: Rename (maybe_)?update_large_partitions This renames it to record_large_partitions, which matches record_large_rows. It also changes the signature to be closer to record_large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola	f254664fe6	db: refactor large data deletion code The code for deleting entries from system.large_partitions was almost a duplicate from the code for deleting entries from system.large_rows. This patch unifies the two, which also improves the error message when we fail to delete entries from system.large_partitions. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola	16ed9a2574	db: stop the commit log after the tables during shutdown This allows for system.large_partitions to be updated if a large partition is found while writing the last sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Avi Kivity	026821fb59	Merge "Record large rows in the system.large_rows table" from Rafael " This fixes #3988. We already have a system.large_partitions, but only a warning for large rows. These patches close the gap by also recording large rows into a new system.large_rows. " * 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla: Add a testcase for large rows Populate system.large_rows. Create a system.large_rows table Extract a key_to_str helper Don't call record_large_rows if stopped Add a delete_large_rows_entries method to large_data_handler db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void Rename maybe_delete_large_partitions_entry Rename log_large_row to record_large_rows Rename maybe_log_large_row to maybe_record_large_rows	2019-03-04 18:31:10 +02:00
Avi Kivity	da0a25859b	Merge "Improvements to commitlog logs" from Paweł " This series contains minor improvements to commitlog log messages that have helped investigating #4231, but are not specific to that bug. " * tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla: commitlog: use consistent chunk offsets in logs commitlog: provide more information in logs commitlog: remove unnecessary comment	2019-03-04 14:52:46 +02:00
Paweł Dziepak	00b33de25c	commitlog: use consistent chunk offsets in logs Logs in commitlog writer use offset in the file of the chunk header to identify chunks. However, the replayer is using offset after the header for the same purpose. This causes unnecessary confusion suggesting that the replayer is reading at the wrong position. This patch changes the replayer so that it reports chunk header offsets.	2019-03-04 12:15:50 +00:00
Paweł Dziepak	813b00a1a6	commitlog: provide more information in logs This commits adds some more information to the logs. Motivated, by experiences with investigating #4231. * size of each write * position of each write * log message for final write	2019-03-04 12:15:50 +00:00
Paweł Dziepak	1a657e9c5f	commitlog: remove unnecessary comment	2019-03-04 12:15:50 +00:00
Paweł Dziepak	434023425d	commitlog: write the correct buffer size Commitlog files contain multiple chunks. Each chunk starts as a single (possibly, fragmented buffer). The size of that buffer in memory may be larger than the size in the file. cycle() was incorrectly using the in-memory size to write the whole buffer to the file. That sometimes caused data corruption, since a smaller on-file size was used to compute the offset of the next chunk and there could be multiple chunk writes happening at the same time. This patch solves the issue by ensuring that only the actual on-file size of the chunk is written.	2019-03-04 10:25:48 +00:00
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Rafael Ávila de Espíndola	25f81cf3e3	Populate system.large_rows. It now records large rows when they are first written to an sstable and removes them when the sstable is deleted. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola	66d8a0cf93	Create a system.large_rows table This is analogous to the system.large_partitions table, but holds individual rows, so it also needs the clustering key of the large rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	da4c0da78a	Extract a key_to_str helper It will be used in more places in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	b7fd03d0fd	Don't call record_large_rows if stopped The implementations large_data_handler should only be called if large_data_handler hasn't been stopped yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	0c401f56f8	Add a delete_large_rows_entries method to large_data_handler This will be responsible for removing large rows from system.large_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	81a21ea425	db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void These functions will record into tables in a followup patch, so they will need to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	d4c001cba8	Rename maybe_delete_large_partitions_entry It will also delete large rows, so rename it to maybe_delete_large_data_entries. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	e9a13aff90	Rename log_large_row to record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola	6fb7066755	Rename maybe_log_large_row to maybe_record_large_rows It will also record into a table in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-26 15:46:21 -08:00
Avi Kivity	5f94bc902a	transport: add option to disable shard-aware drivers The shard-aware drivers can cause a huge amount of connections to be created when there are tens of thousands of clients. While normally the shard-aware drivers are beneficial, in those cases they can consume too much memory. Provide an option to disable shard awareness from the server (it is likely to be easier to do this on the server than to reprovision those thousands of clients). Tests: manual test with wireshark. Message-Id: <20190223173331.24424-1-avi@scylladb.com>	2019-02-26 12:44:11 +01:00
Benny Halevy	13ffda5c31	database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions 1. We would like to be able to call maybe_delete_large_partitions_entry from the sstable destructor path in the future so the sstable might go away while the large data entries are being deleted. 2. We would like the caller to handle any exception on this path, especially in the prepatation part, before calling delete_large_partitions_entry(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Tomasz Grabiec	8687666169	schema_tables: Add trace-level logging of schema mutations Can be useful in diagnosing problems with application of schema mutations. do_merge_schema() is called on every change of schema of the local node. create_table_from_mutations() is called on schema merge when a table was altered or created using mutations read from local schema tables after applying the change, or when loading schema on boot. Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>	2019-02-21 12:16:38 +02:00
Avi Kivity	9adfd11374	Merge "Avoid including cryptopp headers" from Rafael " cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. This patch series introduces a single .cc file that has to include cryptopp headers. " * 'avoid-cryptopp-v3' of https://github.com/espindola/scylla: Avoid including cryptopp headers Delete dead code	2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Calle Wilund	e70286a849	db/extensions: Allow schema extensions to turn themselves off Fixes #4222 Iff an extension creation callback returns null (not exception) we treat this as "I'm not needed" and simply ignore it. Message-Id: <20190213124311.23238-1-calle@scylladb.com>	2019-02-13 14:50:51 +02:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00

1 2 3 4 5 ...

1311 Commits