scylladb

Author	SHA1	Message	Date
Avi Kivity	da0a25859b	Merge "Improvements to commitlog logs" from Paweł " This series contains minor improvements to commitlog log messages that have helped investigating #4231, but are not specific to that bug. " * tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla: commitlog: use consistent chunk offsets in logs commitlog: provide more information in logs commitlog: remove unnecessary comment	2019-03-04 14:52:46 +02:00
Paweł Dziepak	00b33de25c	commitlog: use consistent chunk offsets in logs Logs in commitlog writer use offset in the file of the chunk header to identify chunks. However, the replayer is using offset after the header for the same purpose. This causes unnecessary confusion suggesting that the replayer is reading at the wrong position. This patch changes the replayer so that it reports chunk header offsets.	2019-03-04 12:15:50 +00:00
Paweł Dziepak	813b00a1a6	commitlog: provide more information in logs This commits adds some more information to the logs. Motivated, by experiences with investigating #4231. * size of each write * position of each write * log message for final write	2019-03-04 12:15:50 +00:00
Paweł Dziepak	1a657e9c5f	commitlog: remove unnecessary comment	2019-03-04 12:15:50 +00:00
Paweł Dziepak	434023425d	commitlog: write the correct buffer size Commitlog files contain multiple chunks. Each chunk starts as a single (possibly, fragmented buffer). The size of that buffer in memory may be larger than the size in the file. cycle() was incorrectly using the in-memory size to write the whole buffer to the file. That sometimes caused data corruption, since a smaller on-file size was used to compute the offset of the next chunk and there could be multiple chunk writes happening at the same time. This patch solves the issue by ensuring that only the actual on-file size of the chunk is written.	2019-03-04 10:25:48 +00:00
Piotr Sarna	5f85a7a821	db,view: fix virtual columns liveness checks When looking for optimization paths, columns selected in a view are checked against multiple conditions - unfortunately virtual columns were erroneously skipped from that check, which resulted in ignoring their TTLs. That can lead to overoptimizing and not including vital liveness info into view rows, which can then result in row disappearing too early.	2019-02-28 10:47:19 +01:00
Avi Kivity	5f94bc902a	transport: add option to disable shard-aware drivers The shard-aware drivers can cause a huge amount of connections to be created when there are tens of thousands of clients. While normally the shard-aware drivers are beneficial, in those cases they can consume too much memory. Provide an option to disable shard awareness from the server (it is likely to be easier to do this on the server than to reprovision those thousands of clients). Tests: manual test with wireshark. Message-Id: <20190223173331.24424-1-avi@scylladb.com>	2019-02-26 12:44:11 +01:00
Benny Halevy	13ffda5c31	database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions 1. We would like to be able to call maybe_delete_large_partitions_entry from the sstable destructor path in the future so the sstable might go away while the large data entries are being deleted. 2. We would like the caller to handle any exception on this path, especially in the prepatation part, before calling delete_large_partitions_entry(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-02-22 10:44:02 +02:00
Tomasz Grabiec	8687666169	schema_tables: Add trace-level logging of schema mutations Can be useful in diagnosing problems with application of schema mutations. do_merge_schema() is called on every change of schema of the local node. create_table_from_mutations() is called on schema merge when a table was altered or created using mutations read from local schema tables after applying the change, or when loading schema on boot. Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>	2019-02-21 12:16:38 +02:00
Avi Kivity	9adfd11374	Merge "Avoid including cryptopp headers" from Rafael " cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. This patch series introduces a single .cc file that has to include cryptopp headers. " * 'avoid-cryptopp-v3' of https://github.com/espindola/scylla: Avoid including cryptopp headers Delete dead code	2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Piotr Sarna	bd52e05ae2	view: minimize generated view updates for unselected columns In some cases generating view updates for columns that were not selected in CREATE VIEW statement is redundant - it is the case when the update will not influence row liveness in anyway. Currently, these cases are optimized out: - row marker is live and only unselected columns were updated; - row marked is not live and only unselected columns were updated, and in the process nothing was created or deleted and there was no TTL involved;	2019-02-20 14:05:27 +01:00
Piotr Sarna	dbe8491655	view: cache is_index for view pointer It's detrimental to keep querying index manager whether a view is backing a secondary index every time, so this value is cached at construct time. At the same time, this value is not simply passed to view_info when being created in secondary index manager, in order to decouple materialized view logic from secondary indexes as much as possible (the sole existence of is_index() is bad enough).	2019-02-20 12:52:32 +01:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Calle Wilund	e70286a849	db/extensions: Allow schema extensions to turn themselves off Fixes #4222 Iff an extension creation callback returns null (not exception) we treat this as "I'm not needed" and simply ignore it. Message-Id: <20190213124311.23238-1-calle@scylladb.com>	2019-02-13 14:50:51 +02:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00
Calle Wilund	ba6a8ef35b	tls: Use a default prio string disabling TLS1.0 forcing min 128bits Fixes #4010 Unless user sets this explicitly, we should try explicitly avoid deprecated protocol versions. While gnutls should do this for connections initiated thusly, clients such as drivers etc might use obsolete versions. Message-Id: <20190107131513.30197-1-calle@scylladb.com>	2019-02-05 15:34:18 +02:00
Avi Kivity	6c71eae63f	Merge "API: Stream compaction history records" from Amnon " get_compaction_history can return a lot of records which will add up to a big http reply. This series makes sure it will not create large allocations when returning the results. It adds an api to the query_processor to use paged queries with a consumer function that returns a future, this way we can use the http stream after each record. This implementation will prevent large allocations and stalls. Fixes #4152 " * 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev: tests/query_processor_test: add query_with_consumer_test system_keyspace, api: stream get_compaction_history query_processor: query and for_each_cql_result with future	2019-02-05 14:16:36 +02:00
Amnon Heiman	6c7742d616	system_keyspace, api: stream get_compaction_history get_compaciton_history can return big chunk of data. To prevent large memory allocation, the get_compaction_history now read each compaction_history record and use the http stream to send it. Fixes #4152 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Piotr Jastrzebski	834bec5cc9	Read shard awareness columns as dropped Without this new version of Scylla won't be able to start with system tables inherited after older version that had shard awareness columns. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>	2019-02-04 18:43:11 +02:00
Calle Wilund	9cadbaa96f	commitlog_replayer: Bugfix: finding truncation positions uses local var ref "uuid" was ref:ed in a continuation. Works 99.9% of the time because the continuation is not actually delayed (and assuming we begin the checks with non-truncated (system) cf:s it works). But if we do delay continuation, the resulting cf map will be borked. Fixes #4187. Message-Id: <20190204141831.3387-1-calle@scylladb.com>	2019-02-04 16:51:13 +02:00
Avi Kivity	468f8c7ee7	Merge "Print a warning if a row is too large" from Rafael " This is a first step in fixing #3988. " * 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla: Rename large_partition_handler Print a warning if a row is too large Remove defaut parameter value Rename _threshold_bytes to _partition_threshold_bytes keys: add schema-aware printing for clustering_key_prefix	2019-02-03 13:57:42 +02:00
Piotr Jastrzebski	ad217bbdc7	Revert "system_keyspace: add sharding information to local table" This reverts commit `bdce561ada`. Those columns are not used and cause problems with tools. Refs #4112 Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>	2019-01-31 19:06:55 +01:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola	1185138a34	Print a warning if a row is too large Tests: unit (release) Refs #3988. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola	776d5bb9e2	Remove defaut parameter value The value is already passed by cql_table_large_partition_handler, so the default was just for nop_large_partition_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola	30528fa853	Rename _threshold_bytes to _partition_threshold_bytes A followup patch will add a threshold for rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 13:02:01 -08:00
Duarte Nunes	ea34e242de	Merge 'Do not use hints for view building' from Piotr " This series prevents view building to fall back to storing hints. Instead, it will try to send hints to an endpoint as if it has consistency level ONE, and in case of failure retry the whole building step. Then, view building will never be marked as finished prematurely (because of pending hints), which will help avoid creating inconsistencies when decommissioning a node from the cluster. Tests: unit (release) dtest (materialized_views_test.py.) Fixes #3857 Fixes #4039 " 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla: db,view: add updating view_building_paused statistics database: add view_building_paused metrics table: make populate_views not allow hints db,view: add allow_hints parameter to mutate_MV storage_proxy: add allow_hints parameter to send_to_endpoint	2019-01-28 10:31:14 +00:00
Piotr Sarna	9a6261ca27	db,view: add updating view_building_paused statistics Each time view building does is paused because of connection failure, view_building_paused metrics is bumped.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e30cf22956	db,view: add allow_hints parameter to mutate_MV Mutating MV function can now accept a parameter whether hints should be allowed during sending mutations to endpoints.	2019-01-28 09:38:42 +01:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola	5332ebd50c	Update the description of compaction_large_partition_warning_threshold_mb Despite the name, this option also controls if a warning is issued during memtable writes. Warning during memtable writes is useful but the option name also exists in cassandra, so probably the best we can do is update the description. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190125020821.72815-1-espindola@scylladb.com>	2019-01-28 09:09:35 +02:00
Piotr Jastrzebski	ad016a732b	Move set_type_impl out of types.hh to types/set.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	b1e1b66732	Move list_type_impl out of types.hh to types/list.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	147cc031db	Move map_type_impl out of types.hh to types/map.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:56:38 +01:00
Piotr Jastrzebski	7666e81b51	Decouple database.hh from types/user.hh This commit declares shared_ptr<user_types_metadata> in database.hh were user_types_metadata is an incomplete type so it requires "Allow to use shared_ptr with incomplete type other than sstable" to compile correctly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:55:04 +01:00
Piotr Jastrzebski	e92b4c3dbc	Move user_type_impl out of types.hh to types/user.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola	f7d1dc16d4	database: Use nop_large_partition_handler to avoid self-reporting Currently nop_large_partition_handler is only used in tests, but it can also be used avoid self-reporting. Tests: unit(Release) I also tested starting scylla with --compaction-large-partition-warning-threshold-mb=0. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190123205059.39573-1-espindola@scylladb.com>	2019-01-23 21:11:21 +00:00
Duarte Nunes	88c7c1e851	Merge 'hinted handoff: cache cf mappings' from Vlad " Cache cf mappings when breaking in the middle of a segment sending so that the sender has them the next time it wants to send this segment for where it left off before. Also add the "discard" metric so that we can track hints that are being discarded in the send flow. " Fixes #4122 * 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla: hinted handoff: cache column family mappings for segments that were not sent out in full hinted handoff: add a "discarded" metric	2019-01-23 00:44:41 +00:00
Vlad Zolotarov	34829b8f81	hinted handoff: cache column family mappings for segments that were not sent out in full We will try to send a particular segment later (in 1s) from the place where we left off if it wasn't sent out in full before. However we may miss some of column family mappings when we get back to sending this file and start sending from some entry in the middle of it (where we left off) if we didn't save column family mappings we cached while reading this segment from its begining. This happens because commitlog doesn't save a column family information in every entry but rather once for each uniq column family (version) per "cycle" (see commitlog::segment description for more info). Therefore we have to assume that a particular column family mapping appears once in the whole segment (worst case). And therefore, when we decide to resume sending a segment we need to keep the column family mappings we accumulated so far and drop them only after we are done with this particular segment (sent it out in full). Fixes #4122 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 15:24:22 -05:00
Vlad Zolotarov	4516a8cfc4	hinted handoff: add a "discarded" metric Account the amount of hints that were discarded in the send path. This may happen for instance due to a schema change or because a hint being to old. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 14:11:09 -05:00
Benny Halevy	93270dd8e0	gc_clock: make 64 bit Fixes: #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	9878b36895	db: get default_time_to_live as int32_t rather than gc_clock::rep Otherwise, value_cast<> throws std::bad_cast exception when gc_clock::rep is defined as int64_t. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00

1 2 3 4 5 ...

1284 Commits