scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Avi Kivity	a4bd56ce40	tests: fix partitioner_test build on gcc 5	2017-06-13 21:56:02 +03:00
Calle Wilund	6340fe61af	commitlog_test: Fix test_commitlog_delete_when_over_disk_limit Test should a.) Wait for the flush semaphore b.) Only compare segement sets between start and end, not start, end and inbetwen. I.e. the test sort of assumed we started with < 2 (or so) segments. Not always the case (timing) Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com> (cherry picked from commit `0c598e5645`)	2017-06-13 19:53:13 +03:00
Asias He	f2317a6f3f	repair: Fix range use after free Capture it by value. scylla: [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) scylla: [shard 0] repair - Failed sync of range ==<runtime_exception (runtime error: Invalid token. Should have size 8, has size 0#012)>: streaming::stream_exception (Stream failed) Message-Id: <7fda4432e54365f64b556e7e4c26e36d3a9bb1b7.1497238229.git.asias@scylladb.com> (cherry picked from commit `2bcb368a13`)	2017-06-13 11:03:14 +03:00
Paweł Dziepak	7bb41b50f9	commitlog: avoid copying column_mapping It is safe to copy column_mapping accros shards. Such guarantee comes at the cost of performance. This patch makes commitlog_entry_writer use IDL generated writer to serialise commitlog_entry so that column_mapping is not copied. This also simplifies commitlog_entry itself. Performance difference tested with: perf_simple_query -c4 --write --duration 60 (medians) before after diff write 79434.35 89247.54 +12.3% (cherry picked from commit `374c8a56ac`) Also: Fixes #2468.	2017-06-11 15:44:20 +03:00
Paweł Dziepak	57d602fdd6	idl: fix generated writers when member functions are used When using member name in an idetifer of generated class or method idl compiler should strip the trailing '()'. (cherry picked from commit `4df4994b71`) (part of #2468)	2017-06-11 15:43:53 +03:00
Paweł Dziepak	cd14b83192	idl: add start_frame() overload for seastar::simple_output_stream (cherry picked from commit `018d16d315`) (part of #2468)	2017-06-11 15:43:11 +03:00
Avi Kivity	a85b70d846	Merge "repair memory usage fix" from Asias "This series switches repair to use more stream plans to stream the mismatched sub ranges and use a range generator to produce sub ranges. Test shows no huge memory is used for repair with large data set. In addition, we now have a progress reporter in the log how many ranges are processed. Jun 06 14:18:22 [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942] Jun 06 14:19:55 [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942] Fixes #2430." * tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev: repair: Remove unused sub_ranges_max repair: Reduce parallelism in repair_ranges repair: Tweak the log a bit repair: Use more stream_plan repair: iterator over subranges instead of list (cherry picked from commit `419ad9d6cb`)	2017-06-08 14:52:28 +03:00
Avi Kivity	f44ea5335b	Update seastar submodule * seastar 812e232...18a82e2 (1): > scripts: posix_net_conf.sh: fix bash syntax causing a failure during bonding iface configuration Fixes #2269	2017-06-07 18:23:02 +03:00
Pekka Enberg	a95c045b48	Merge "Fixes to thrift/server" from Duarte "This series fixes some issues with the thrift_server, namely ensuring that streams and sockets are properly closed. Fixes #499 Fixes #2437" * 'thrift-server-fixes/v1' of github.com:duarten/scylla: thrift/server: Close connections when stopping server thrift/server: Move connection class to header thrift/server: Shutdown connection thrift/server: Close output_stream when connection is done (cherry picked from commit `a6dc21615b`)	2017-06-07 16:08:28 +03:00
Avi Kivity	eb396d2795	Update seastar submodule * seastar 328fdbc...812e232 (1): > rpc: handle messages larger than memory limit Fixes #2453.	2017-06-07 12:29:59 +03:00
Takuya ASADA	dbbf99d7fa	dist/debian: install gdebi when it's not exist Since we started to use gdebi for install build-dep metapackage that generated by mk-build-dep, we need to install gdebi on build_deb.sh too. Fixes #2451 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1496819209-30318-1-git-send-email-syuu@scylladb.com> (cherry picked from commit `7fe63c539a`) scylla-1.7.1	2017-06-07 10:25:02 +03:00
Raphael S. Carvalho	f7a143e7be	sstables: fix report of disk space used by bloom filter After change in boot, read_filter is called by distributed loader, so its update to _filter_file_size is lost. The load variant which receives foreign components that must do it. We were also not updating it for newly created sstables. Fixes #2449. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170606151129.5477-1-raphaelsc@scylladb.com> (cherry picked from commit `0ca1e5cca3`)	2017-06-06 19:00:00 +03:00
Takuya ASADA	562102cc76	dist/debian: use gdebi instead of mk-build-deps -i At least on Debian8, mk-build-deps -i silently finishes with return code 0 even it fails to install dependencies. To prevent this, we should manually install the metapackage generated by mk-build-deps using gdebi. Fixes #2445 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1496737502-10737-2-git-send-email-syuu@scylladb.com> (cherry picked from commit `a4c392c113`)	2017-06-06 14:18:14 +03:00
Takuya ASADA	d4b444418a	dist/debian/dep: install texlive from jessie-backports to prevent gdb build fail on jessie Installing openjdk-8-jre-headless from jessie-backports breaks texlive on jessie main repo. It causes 'Unmet build dependencies' error when building gdb package. To prevent this, force insatlling texlive from jessie-backports before start building gdb. Fixes #2444 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1496737502-10737-1-git-send-email-syuu@scylladb.com> (cherry picked from commit `5608842e96`)	2017-06-06 14:18:08 +03:00
Raphael S. Carvalho	befd4c9819	db: fix computation of live disk usage stat after compaction sstable::data_size() is used by rebuild_statistics() which only returns uncompressed data size, and the function called by it expects actual disk space used by all components. Boot uses add_sstable() which correctly updates the stat with sstable::bytes_on_disk(). That's what needs to be used by r__s() too. Fixes #1592 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170525210055.6391-1-raphaelsc@scylladb.com> (cherry picked from commit `3b5ad23532`)	2017-05-28 10:39:14 +03:00
Avi Kivity	eb2fe0fbd3	Merge "reduce memory requirement for loading sstables" from Rapahel "fixes a problem in which memory requirement for loading in-memory components of sstables is very high due to unlimited parallelism." * 'mem_requirement_sstable_load_v2_2' of github.com:raphaelsc/scylla: database: fix indentation of distributed_loader::open_sstable database: reduce memory requirement to load sstables sstables: loads components for a sstable in parallel sstables: enable read ahead for read of in-memory components sstables: make random_access_reader work with read ahead (cherry picked from commit `ef428d008c`)	2017-05-25 12:59:55 +03:00
Raphael S. Carvalho	eb6b0b1267	db: remove partial sstable created by memtable flush which failed partial sstable files aren't being removed after each failed attempt to flush memtable, which happens periodically. If the cause of the failure is ENOSPC, memtable flush will be attempted forever, and as a result, column family may be left with a huge amount of partial files which will overwhelm subsequent boot when removing temporary TOC. In the past, it led to OOM because removal of temporary TOC took place in parallel. Fixes #2407. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170525015455.23776-1-raphaelsc@scylladb.com> (cherry picked from commit `b7e1575ad4`)	2017-05-25 11:50:17 +03:00
Asias He	7836600ded	streaming: Do not abort session too early in idle detection Streaming ususally takes long time to complete. Abort it on false positive idle detection can be very wasteful. Increase the abort timeout from 10 minutes to a very large timeout, 300 minutes. The real idle session will be aborted eventually if other mechanisms, e.g., streaming manager has gossip callback for on_remove and on_restart event to abort, do not abort the session. Fixes #2197 Message-Id: <57f81bfebfdc6f42164de5a84733097c001b394e.1494552921.git.asias@scylladb.com> (cherry picked from commit `f792c78c96`)	2017-05-24 12:30:47 +03:00
Shlomi Livne	230c33da49	release: prepare for 1.7.1 Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2017-05-23 22:42:52 +03:00
Raphael S. Carvalho	17d8a0c727	compaction: do not write expired cell as dead cell if it can be purged right away When compacting a fully expired sstable, we're not allowing that sstable to be purged because expired cell is unconditionally converted into a dead cell. Why not check if the expired cell can be purged instead using gc before and max purgeable timestamp? Currently, we need two compactions to get rid of a fully expired sstable which cells could have always been purged. look at this sstable with expired cell: { "partition" : { "key" : [ "2" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 120, "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z", "ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true }, "cells" : [ { "name" : "country", "value" : "1" }, ] now this sstable data after first compaction: [shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79 (~65% of original) in 229ms = 0.000328997MB/s. { ... "rows" : [ { "type" : "row", "position" : 79, "cells" : [ { "name" : "country", "deletion_info" : { "local_delete_time" : "2017-04-09T17:07:12Z" }, "tstamp" : "2017-04-09T17:07:12.702597Z" }, ] now another compaction will actually get rid of data: compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original) in 1ms = 0MB/s. ~2 total partitions merged to 0 NOTE: It's a waste of time to wait for second compaction because the expired cell could have been purged at first compaction because it satisfied gc_before and max purgeable timestamp. Fixes #2249, #2253 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com> (cherry picked from commit `a6f8f4fe24`)	2017-05-23 20:57:54 +03:00
Tomasz Grabiec	064de6f8de	row_cache: Fix undefined behavior in read_wide() _underlying is created with _range, which is captured by reference. But range_and_underlyig_reader is moved after being constructed by do_with(), so _range reference is invalidated. Fixes #2377. Message-Id: <1494492025-18091-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `0351ab8bc6`)	2017-05-21 19:09:03 +03:00
Gleb Natapov	df56c108b7	database: remove temporary sstables sequentially The code that removes each sstable runs in a thread. Parallel removing of a lot of sstables may start a lot of threads each of which is taking 128k for its stack. There is no much benefit in running deletion in parallel anyway, so fix it by deleting sstables sequentially. Fixes #2384 Message-Id: <20170516103018.GQ3874@scylladb.com> (cherry picked from commit `c7ad3b9959`)	2017-05-21 18:56:22 +03:00
Tomasz Grabiec	25607ab9df	range: Fix SFINAE rule for picking the best do_lower_bound()/do_upper_bound() overload mutation_partition has a slicing constructor which is supposed to copy only the rows from the query range. The rows are located using nonwrapping_range::lower_bound() and nonwrapping_range::lower_bound(). Those two have two different implementations chosen with SFINAE. One is using std::lower_bound(), and one is using container's built in lower_bound() should it exist. We're using intrusive tree in mutation_partition, so container's lower_bound() is preferred. It's O(log N) whereas std::lower_bound() is O(N), because tree's iterator is not random access. However, the current rule for picking container's lower_bound() never triggers, because lower_bound() has two overloads in the container: ./range.hh:618:14: error: decltype cannot resolve address of overloaded function typename = decltype(&std::remove_reference<Range>::type::upper_bound)> ^~~~~~~~ As a result, the overload which uses std::lower_bound() is used. Spotted when running perf_fast_forward with wide partition limit in cache lifted off. It's so slow that I timeouted waiting for the result (> 16 min). Fixes #2395. Message-Id: <1495048614-9913-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `3fc1703ccf`) scylla-1.7.0	2017-05-18 17:12:00 +03:00
Avi Kivity	b26bd8bbeb	tests: fix partitioner_test for g++ 5 It can't make the leap from dht::ring_position to stdx::optional<range_bound<dht::ring_position>> for some reason. (cherry picked from commit `ba31619594`)	2017-05-18 13:10:48 +03:00
Avi Kivity	1ca7f5458b	Update seastar submodule > tls: make shutdown/close do "clean" handshake shutdown in background > tls: Make sink/source (i.e. streams) first class channel owners > native-stack: Make sink/source (i.e. streams) first class channel owners More close() fixes, pointed out by Tomek.	2017-05-17 19:01:44 +03:00
Calle Wilund	50c8a08e91	scylla: fix compilation errors on gcc 5 Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com> (cherry picked from commit `6ca07f16c1`)	2017-05-17 18:04:58 +03:00
Avi Kivity	9d1b9084ed	Update seastar submodule * seastar bfa1cb2...774c09c (1): > posix-stack: Make sink/source (i.e. streams) first class channel owners	2017-05-17 16:44:34 +03:00
Tomasz Grabiec	e2c75d8532	Merge "Fix performance problems with high shard counts tag" from Avi From http://github.com/avikivity/scylla exponential-sharder/v3. The sharder, which takes a range of tokens and splits it among shards, is slow with large shard count and the default murmur3_partitioner_ignore_msb_bits. This patchset fixes excessive iteration in sstable sharding metadata writer and nonsignular range scans. Without this patchset, sealing a memtable takes > 60 ms on a 48-shard system. With the patchset, it drops below the latency tracker threshold I used (5 ms). Fixes #2392. (cherry picked from commit `84648f73ef`)	2017-05-17 16:19:24 +03:00
Duarte Nunes	59063f4891	tests: Add test case for nonwrapping_range::intersection() Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `f365b7f1f7`)	2017-05-17 15:59:06 +03:00
Duarte Nunes	de79792373	nonwrapping_range: Add intersection() function intersection() returns an optional range with the intersection of the this range and the other, specified range. Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `1f9359efba`)	2017-05-17 15:58:55 +03:00
Avi Kivity	3557b449ac	Merge "Adding private repository to housekeeping" from Amnon "This series adds private repository support to scylla-housekeeping" * 'amnon/housekeeping_private_repo_v3' of github.com:cloudius-systems/seastar-dev: scylla-housekeeping service: Support private repositories scylla-housekeeping-upstart: Use repository id, when checking for version scylla-housekeeping: support private repositories (cherry picked from commit `eb69fe78a4`)	2017-05-17 15:58:29 +03:00
Pekka Enberg	a8e89d624a	cql3: Fix variable_specifications class get_partition_key_bind_indexes() The "_specs" array contains column specifications that have the bind marker name if there is one. That results in get_partition_key_bind_indices() not being able to look up a column definition for such columns. Fix the issue by keeping track of the actual column specifications passed to add() like Cassandra does. Fixes #2369 (cherry picked from commit a45e656efb4c6478d80e4dfc18de99b94712eeba)	2017-05-10 10:00:47 +03:00
Pekka Enberg	31cd6914a8	cql3: Move variable_specifications implementation to source file Move the class implementation to source file to reduce the need to recompile everything when the implementation changes... Message-Id: <1494312003-8428-1-git-send-email-penberg@scylladb.com> (cherry picked from commit `5b931268d4`)	2017-05-10 10:00:31 +03:00
Pekka Enberg	a441f889c3	cql3: Fix partition key bind indices for prepared statements Fix the CQL front-end to populate the partition key bind index array in result message prepared metadata, which is needed for CQL binary protocol v4 to function correctly. Fixes #2355. (cherry picked from commit ebd76617276e660c590cec0a07e97e82422111df) Tested-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <1494257274-1189-1-git-send-email-penberg@scylladb.com>	2017-05-10 10:00:21 +03:00
Pekka Enberg	91b7cb8576	Merge "gossip mark alive fixes" from Asias "This series fixes the user after free issue in gossip and elimates the duplicated / unnecessary mark alive operations. Fixes #2341" * tag 'asias/gossip_fix_mark_alive/v1' of github.com:cloudius-systems/seastar-dev: gossip: Ignore callbacks and mark alive operation in shadow round gossip: Ingore the duplicated mark alive operation gossip: Fix user after free in mark_alive (cherry picked from commit `1e04731fa0`)	2017-05-09 01:57:23 +03:00
Avi Kivity	2b17c4aacf	Merge "Fix update of counter in static rows" from Paweł "The logic responsible for converting counter updates to counter shards was not covered by unit tests and didn't transform counter cells inside static rows. This series fixes the problem and makes sure that the tests cover both static rows and transformation logic. Fixes #2334." * tag 'pdziepak/static-counter-updates-1.7/v1' of github.com:cloudius-systems/seastar-dev: tests/counter: test transform_counter_updates_to_shards tests/counter: test static columns counters: transform static rows from updates to shards	2017-05-06 15:54:20 +03:00
Pekka Enberg	f61d9ac632	release: prepare for 1.7.0	2017-05-04 15:28:28 +03:00
Asias He	fc9db8bb03	repair: Fix partition estimation We estimate number of partitions for a given range of a column familiy and split the range into sub ranges contains fewer partitions as a checksum unit. The estimation is wrong, because we need to count the partitions on all the shards, instead of only counting the local shard. Fixes #2299 Message-Id: <7876285bd26cfaf65563d6e03ec541626814118a.1493817339.git.asias@scylladb.com> (cherry picked from commit `66e3b73b9c`)	2017-05-03 16:26:01 +03:00
Paweł Dziepak	bd67d23927	tests/counter: test transform_counter_updates_to_shards	2017-05-02 13:49:43 +01:00
Paweł Dziepak	bdeeebbd74	tests/counter: test static columns	2017-05-02 13:49:43 +01:00
Paweł Dziepak	a1cb29e7ec	counters: transform static rows from updates to shards	2017-05-02 13:49:43 +01:00
Amnon Heiman	e8369644fd	scylla_setup: Fix conditional when checking for newer version During the changes in the way the housekeeping check for newer version and warn about it in the installation the UUID part was removed but kept in the sarounding if. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170426075724.7132-1-amnon@scylladb.com> (cherry picked from commit `b59c95359d`)	2017-05-01 12:14:04 +03:00
Glauber Costa	a36cabdb30	reduce kernel scheduler wakeup granularity We set the scheduler wakeup granularity to 500usec, because that is the difference in runtime we want to see from a waking task before it preempts the running task (which will usually be Scylla). Scheduling other processes less often is usually good for Scylla, but in this case, one of the "other processes" is also a Scylla thread, the one we have been using for marking ticks after we have abandoned signals. However, there is an artifact from the Linux scheduler that causes those preemption to be missed if the wakeup granularity is exactly twice as small as the sched_latency. Our sched_latency is set to 1ms, which represents the maximum time period in which we will run all runnable tasks. We want to keep the sched_latency at 1ms, so we will reduce the wakeup granularity so to something slightly lower than 500usec, to make sure that such artifact won't affect the scheduler calculations. 499.99usec will do - according to my tests, but we will reduce it to a round number. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20170427135039.8350-1-glauber@scylladb.com> (cherry picked from commit `14b9aa2285`)	2017-05-01 11:13:51 +03:00
Raphael S. Carvalho	1d26fab73e	sstables: add method to export ancestors Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-05-01 11:09:42 +03:00
Shlomi Livne	5f0c635da7	release: prepare for 1.7.rc3 Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2017-05-01 09:53:20 +03:00
Raphael S. Carvalho	82cc3d7aa5	dtcs: do not compact fully expired sstable which ancestor is not deleted yet Currently, fully expired sstable[1] is unconditionally chosen for compaction by DTCS, but that may lead to a compaction loop under certain conditions. Let's consider that an almost expired sstable is compacted, and it's not deleted yet, and that the new sstable becomes expired before its ancestor is deleted. Because this new sstable is expired, it will be chosen by DTCS, but it will not be purged because 'compacted undeleted' sstables are taken into account by calculation of max purgeable timestamp and prevents expired data from being purged. The problem is that this sequence of events can keep happening forever as reported by issue #2260. NOTE: This problem was easier to reproduce before improvement on compaction of expired cells, because fully expired sstable was being converted into a sstable full of tombstones, which is also considered fully expired. Fixes #2260. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com> (cherry picked from commit `687a4bb0c2`)	2017-04-30 19:36:00 +03:00
Paweł Dziepak	98d782cfe1	db: make virtual dirty soft limit configurable Message-Id: <20170428150005.28454-1-pdziepak@scylladb.com> (cherry picked from commit `24f4dcf9e4`)	2017-04-30 19:17:55 +03:00
Avi Kivity	ea0591ad3d	Merge "] Fix problems with slicing using sstable's promoted index" from Tomasz "Fixes #2327. Fixes #2326." * 'tgrabiec/fix-promoted-index-parsing-1.7' of github.com:cloudius-systems/seastar-dev: sstables: Fix incorrect parsing of cell names in promoted index sstables: Fix find_disk_ranges() to not miss relevant range tombstones	2017-04-30 14:48:54 +03:00
Paweł Dziepak	7eedd743bf	lsa: introduce upper bound on zone size Attempting to create huge zones may introduce significant latency. This patch introduces the maximum allowed zone size so that the time spent trying to allocate and initialising zone is bounded. Fixes #2335. Message-Id: <20170428145916.28093-1-pdziepak@scylladb.com> (cherry picked from commit `f5cf86484e`)	2017-04-30 10:58:34 +03:00
Tomasz Grabiec	8a21961ec9	sstables: Fix incorrect parsing of cell names in promoted index Range tombstones are serialized to cell names in this place: _sst.maybe_flush_pi_block(_out, start, {}); Note that the column set is empty. This is correct. A range tombstone only has a clustering part. The cell name is deserialized by promoted index reader using mp_row_consumer::column, like this: mp_row_consumer::column col(schema, std::move(col_name), api::max_timestamp); return std::move(col.clustering); The problem is, column constructor assumes that there is always a component corresponding to a cell name if the table is not dense, and will pop it from the set of components (the clustering field): , cell(!schema.is_dense() ? pop_back(clustering) : (*(schema.regular_begin())).name()) promoted index block which starts or ends with a range tombstone will appear as having incorrect bounds. This may result in an incorrect value for data file range start to be calculated. Fixes #2327.	2017-04-27 18:30:00 +02:00

1 2 3 4 5 ...

11372 Commits