scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	5184381a0b	memtable: Deconstify memtable in readers We want to upgrade entries on read and for that we need mutating permission.	2016-01-11 10:34:51 +01:00
Avi Kivity	0c755d2c94	db: reduce log spam when ignoring an sstable With 10 sstables/shard and 50 shards, we get ~105050 messages = 25,000 log messages about sstables being ignored. This is not reasonable. Reduce the log level to debug, and move the message to database.cc, because at its original location, the containing function has nothing to do with the message itself. Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>	2016-01-07 19:23:25 +02:00
Glauber Costa	74fbd8fac0	do not call open_file_dma directly We have an API that wraps open_file_dma which we use in some places, but in many other places we call the reactor version directly. This patch changes the latter to match the former. It will have the added benefit of allowing us to make easier changes to these interfaces if needed. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>	2016-01-05 10:37:57 +02:00
Avi Kivity	e9400dfa96	Revert "sstable: Initialize super class of malformed_sstable_exception" This reverts commit d69dc32c92d63057edf9f84aa57ca53b2a6e37e4; it does nothing and does not address issue #669.	2016-01-05 10:21:00 +02:00
Benoît Canet	d69dc32c92	sstable: Initialize super class of malformed_sstable_exception This exception was not caught properly as a std::exception by report_failed_future call to report_exception because the superclass std::exception was not initialized. Fixes #669. Signed-off-by: Benoît Canet <benoit@scylladb.com>	2016-01-05 09:54:36 +02:00
Raphael S. Carvalho	b7d36af26f	compaction: fix max_purgeable calculation max_purgeable was being incorrectly calculated because the code that creates vector of uncompacted sstables was wrong. This value is used to determine whether or not a tombstone can be purged. Operand < is supposed to be used instead in the callback passed as third parameter to boost::set_difference. This fix is a step towards closing the issue #676. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-29 09:30:08 +02:00
Vlad Zolotarov	33552829b2	core: use steady_clock where monotinic clock is required Use steady_clock instead of high_resolution_clock where monotonic clock is required. high_resolution_clock is essentially a system_clock (Wall Clock) therefore may not to be assumed monotonic since Wall Clock may move backwards due to time/date adjustments. Fixes issue #638 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-12-27 18:07:53 +02:00
Tomasz Grabiec	0862d2f531	Merge branch 'pdziepak/fix-sstables-key_reader-663/v2' From Paweł: "This series fixes sstables::key_reader not respecting range inclusiveness if the bounds were the keys that were present in the index summary. Fixes #663."	2015-12-18 17:35:09 +01:00
Paweł Dziepak	18b8d7cccc	sstables: respect range inclusiveness in key_reader When choosing a relevant range of buckets it wasn't taken into account whether the range bounds are inclusive or not. That may have resulted in more buckets being read than necessary which was a condition not expected by the code responsible from looking for a relevant keys inside the buckets. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-18 17:24:26 +01:00
Pekka Enberg	eeadf601e6	Merge "cleanups and improvements" from Raphael	2015-12-18 13:45:11 +02:00
Pekka Enberg	40e8a9c99c	sstables/compaction: Fix compilation error with GCC 4.9.2 I am sure it's a compiler issue but I am not ready to give up and upgrade just yet: sstables/compaction.cc:307:55: error: converting to ‘std::unordered_map<int, long int>’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = int; _Tp = long int; _Hash = std::hash<int>; _Pred = std::equal_to<int>; _Alloc = std::allocator<std::pair<const int, long int> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const int, long int> >]’ stats->start_size, stats->end_size, {});	2015-12-16 10:03:14 +02:00
Raphael S. Carvalho	193ede68f3	compaction: register and deregister compaction_stats That's important for compaction stats API that will need stats data of each ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:32 -02:00
Raphael S. Carvalho	1fba394dd0	sstables: store keyspace and cf in compaction_stats The reason behind this change is that we will need ks and cf for the compaction stats API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:02 -02:00
Raphael S. Carvalho	ac1a67c8bc	sstables: move compaction_stats to header file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:49:45 -02:00
Raphael S. Carvalho	a2fb0ec9a3	sstables: update compaction history at the end of compaction When compaction job finishes, call function to update the system table COMPACTION_HISTORY. That's also needed for the compaction history API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 14:20:03 -02:00
Raphael S. Carvalho	0fa194c844	sstables: remove outdated comment Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 12:43:53 -02:00
Raphael S. Carvalho	81f5b1716e	sstables: fix comment describing sstable::mark_for_deletion Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 12:43:11 -02:00
Paweł Dziepak	64f50a4f40	db: make clustering_key a prefix Schemas using compact storage can have clustering keys with the trailing components not set and effectively being a clustering key prefixes instead of full clustering keys. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 05:46:47 +01:00
Paweł Dziepak	3c16ab080a	sstables: do not assume clustering_key has the proper format In case of non-compound dense tables the column name is just the value of the clustering key (which has only one component). Current code just casts clustering_key to bytes_view which works because there is no additional metadata in single element clustering keys. However, that may change when the internal representation of clustering key is changed so explicitly extract the proper component. This change will become necessary when clustering_key is replaced by clustering_key_prefix. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 05:46:26 +01:00
Amnon Heiman	e786f1d02f	sstable: Add get_summary function The get_summary method returns a const reference to the summary object. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 14:52:18 +02:00
Amnon Heiman	bae286a5b4	Add memory_footprint method to summary_ka Similiar to origin, off heap memory, memory_footprint is the size of queus multiply by the structure size. memory_footprint is used by the API to report the memory that is taken by the summary. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 14:52:18 +02:00
Tomasz Grabiec	95f515a6bd	Move seastar submodule head Scylla changes: sstable.cc: Remove file_exists() function which conflicts with seastar's Amnon Heiman (2): reactor: Add file_exists method Add a wrapper for file_exists Avi Kivity (2): Merge "Introduce shared_future" from Tomasz Merge ""scripts: a few fixes in posix_net_conf.sh" from Vlad Gleb Natapov (3): rpc: not stop client in error state avoid allocation in parallel_for_each is there is nothing to do memory: fix size_to_idx calculation Nadav Har'El (1): test: fix use-after-free in timertest Pawe�� Dziepak (1): memory: use size instead of old_size to shrink memory block Tomasz Grabiec (7): file: Mark move constructor as noexcept core: future: Add static asserts about type's noexcept guarantees core: future: Drop now redundant move_noexcept flag core: future_state: Make state getters non-destructive for non-rvalue-refs core: future: Make get_available_state() noexcept core: Introduce shared_future Make json_return_type movable Vlad Zolotarov (8): scripts: posix_net_conf.sh: ban NIC IRQs from being moved by irqbalance scripts: posix_net_conf.sh: exclude CPU0 siblings from RPS scripts: posix_net_conf.sh: Configure XPS scripts: posix_net_conf.sh: Add a new mode for MQ NICs scripts: posix_net_conf.sh: increase some backlog sizes core: to_sstring(): cleanup core: to_sstring_strintf(): always use %g(or %lg) format for floating point values core: prevent explicit calls for to_sstring_sprintf()	2015-12-07 10:41:39 +01:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Raphael S. Carvalho	d435ca7da6	enable more logging for leveled compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-06 11:36:50 +02:00
Amnon Heiman	7e79d35f85	Estimated histogram: Clean the add interface The add interface of the estimated histogram is confusing as it is not clear what units are used. This patch removes the general add method and replace it with a add_nano that adds nanoseconds or add that gets duration. To be compatible with origin, nanoseconds vales are translated to microseconds.	2015-12-01 15:28:06 +02:00
Raphael S. Carvalho	a5842642fa	sstables: change buf size in read_simple to 128k Avi says: "A small buffer size will hurt if we read a large file, but a large buffer size won't hurt if we read a small file, since we close it immediately." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-11-19 13:35:25 -02:00
Raphael S. Carvalho	0053394ec0	sstables: introduce mark_sstable_for_deletion Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-11-19 13:34:24 -02:00
Raphael S. Carvalho	f06b72eb18	sstables: introduce function to return sstable key range Provides a function that will return sstable key range reading only the summary component. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-11-19 13:34:19 -02:00
Avi Kivity	79f7431a03	db: change collection_mutation::{one,view} not to use nested classes Nested classes cannot be forward-declared, so change the naming not to use them. Follows atomic_cell{,_view}.	2015-11-13 17:13:07 +02:00
Raphael S. Carvalho	4eb94fbc35	sstables: improve exception handling in compact_sstables Let's move the code that prints that a compaction succeeded only after the code that catches exception on either read or write fibers. Let's also get rid of done and use repeat instead in the read fiber. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-11-08 11:24:55 +02:00
Glauber Costa	958abc17a0	sstables: be more descriptive with filename parsing errors Currently, we don't let the user know even what is the filename that failed. That information should be included in the message. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-11-05 10:45:59 +02:00
Raphael S. Carvalho	279a8f13ee	sstables: compaction: handle failure on read fiber This assert (in write fiber) would fail if read fiber failed because the variable done will not be set to true. The use of assert is very bad, because it prevents scylla from proceeding, which is possible. To solve it, let's trigger an exception if done is not true. We do have code that will wait for both read and write fibers, and catch exceptions, if any. Closes #523. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-11-04 16:48:01 +01:00
Avi Kivity	e2b661bb78	sstables: fix compile error	2015-11-02 15:32:44 +02:00
Avi Kivity	c6968b2940	Make abstract_type::serialize(data_value) a member of data_value Reduces duplication. Fixes #517.	2015-11-02 12:25:35 +01:00
Avi Kivity	2c3591cbd9	data_value de-any-fication We use boost::any to convert to and from database values (stored in serlialized form) and native C++ values. boost::any captures information about the data type (how to copy/move/delete etc.) and stores it inside the boost::any instance. We later retrieve the real value using boost::any_cast. However, data_value (which has a boost::any member) already has type information as a data_type instance. By teaching data_type intances about the corresponding native type, we can elimiante the use of boost::any. While boost::any is evil and eliminating it improves efficiency somewhat, the real goal is growing native type support in data_type. We will use that later to store native types in the cache, enabling O(log n) access to collections, O(1) access to tuples, and more efficient large blob support.	2015-10-30 17:38:51 +01:00
Paweł Dziepak	f46cba7bc8	sstable: simplify key reader Now that #475 is solved an read_indexes() guarantees to return disjont sets of keys sstable key reader can be simplified, namely, only two key lookups are needed (the first and the last one) and there is no need for range splitting. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-27 10:33:02 +02:00
Raphael S. Carvalho	25a4d6686a	sstables: update comment sstable level is set to zero by default, but it may be set to a different value if a new sstable is the result of leveled compaction. This is done outside write_components. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-26 10:39:22 -02:00
Raphael S. Carvalho	028283869f	sstables: reuse BASE_SAMPLING_LEVEL constant from downsampling class Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-26 10:35:46 -02:00
Raphael S. Carvalho	b908d3a5e9	sstables: fix prepare_summary We were incorrectly setting s.header.min_index_interval to BASE_SAMPLING_LEVEL, which luckily is the default value to min index interval. BASE_SAMPLING_LEVEL was also used as the min index interval when checking if the estimated number of summary entries is greater than the limit. To fix problems, get min index interval from schema and use this value to check the limit. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-26 10:27:03 -02:00
Raphael S. Carvalho	af88a6864a	sstables: fix read_indexes read_indexes() will not work for a column family that minimum index interval is different than sampling level or that sampling level is lower than BASE_SAMPLING_LEVEl. That's because the function was using sampling level to determine the interval between indexes that are stored by index summary. Instead, method from downsampling will be used to calculate the effective interval based on both minimum_index_interval and sampling_level parameters. Fixes issue #474. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-23 16:57:41 -02:00
Raphael S. Carvalho	e18cf96b01	sstables: convert Downsampling to C++ Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-23 16:57:39 -02:00
Raphael S. Carvalho	19f2dc9ef9	import Downsampling.java Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-22 14:44:40 -02:00
Avi Kivity	5f3a46eabb	Merge "load_new_sstables" from Glauber "This patchset implements load_new_sstables, allowing one to move tables inside the data directory of a CF, and then call "nodetool refresh" to start using them. Keep in mind that for Cassandra, this is deemed an unsafe operation: https://issues.apache.org/jira/browse/CASSANDRA-6245 It is still for us something we should not recommend - unless the CF is totally empty and not yet used, but we can do a much better job in the safety front. To guarantee that, the process works in four steps: 1) All writes to this specific column family are disabled. This is a horrible thing to do, because dirty memory can grow much more than desired during this. Throughout out this implementation, we will try to keep the time during which the writes are disabled to its bare minimum. While disabling the writes, each shard will tell us about the highest generation number it has seen. 2) We will scan all tables that we haven't seen before. Those are any tables found in the CF datadir, that are higher than the highest generation number seen so far. We will link them to new generation numbers that are sequential to the ones we have so far, and end up with a new generation number that is returned to the next step 3) The generation number computed in the previous step is now propagated to all CFs, which guarantees that all further writes will pick generation numbers that won't conflict with the existing tables. Right after doing that, the writes are resumed. 4) The tables we found in step 2 are passed on to each of the CFs. They can now load those tables while operations to the CF proceed normally."	2015-10-22 13:42:24 +03:00
Glauber Costa	3f6d47f1f2	sstables: change the current level of an sstable This will be used, for instance, when importing an SSTable. We would like to force all new SSTables to sit at level 0 for compaction purposes. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	e11b828b6f	sstables: allow an sstable to set its generation number In some situations (restoring a backup from load_new_sstables), we want to change the SSTable generation number. This patch provides a procedure to achieve that. It does so by linking the old files to new ones, and then removing the old ones. The reason we link instead of removing, is that we want to make sure that in case there is a crash in the middle, the old data is still accessible. If the crash happens after the link is done but before we start removing the old files, that is fine: we will end up with duplicated data that will disappear after the next compaction. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	a4d8e99f1c	sstables: fix create_links so a TemporaryTOC is generated That is the way to generate groups of files for the SSTables, so we must do it. Because the links were mostly used by processes like snapshots and backups where and external tool would (hopefully) verify the results, it was not that serious. But we now plan to use links to bring things into the main directory. It must absolutely be done right. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	876e770df6	sstables: allow create_links to work with an arbitrary generation During some situations (restoring a snapshot for instance) we may want a file to get a different generation. This patch changes the code in create_links slightly, so that it is able to link not only to a different location, but to files with a different name, possibly in the same location - that is equivalent to a generation change. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	27efe8bde9	sstables: make read_toc public This is done on behalf of load_new_sstables: we would like to know which components are present in the file, but without triggering the read for the rest of the metadata. As noted by Avi, using this directly can leave the SSTable in an inconsistent state. We will have to fix is later since this is not the first offender. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:02:42 +02:00
Glauber Costa	f3bad2032d	database: fix type for sstable generation. Avoid using long for it, and let's use a fixed size instead. Let's do signed instead of unsigned to avoid upsetting any code that we may have converted. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-10-21 18:01:20 +02:00

1 2 3 4 5 ...

488 Commits