scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Glauber Costa	3c988e8240	perf_sstable: use current scylla default directory When this tool was written, we were still using /var/lib/cassandra as a default location. We should update it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2015-12-09 17:46:31 +02:00
Avi Kivity	01c3670def	Merge seastar upstream * seastar 5dc22fa...c5e595b (3): > memory: be less strict about NUMA bindings > reactor: let the resource code specify the default memory reserve > resource: reserve even more memory when hwloc is compiled in Fixes #642	2015-12-09 16:47:47 +02:00
Asias He	66938ac129	streaming: Add retransmit logic for streaming verbs Retransmit streaming related verbs and give up in 5 minutes. Tested with: lein test :only cassandra.batch-test/batch-halves-decommission Fixes #568.	2015-12-09 15:12:36 +02:00
Avi Kivity	14794af260	Merge seastar upstream * seastar 9f9182e...5dc22fa (1): > future: add repeat_until_value(): repeat an action until it returns a value	2015-12-09 15:11:59 +02:00
Avi Kivity	213700e42f	Merge seastar upstream * seastar d40453b...9f9182e (5): > Merge "Sleep mode support" > future: add futurize<T>::from_tuple(tuple<T>) > tls: Add missing destructor for dh_params::impl, fixes ASAN error > tls/socket fix: Add missing noexcept to constructor/move > Merge "Initial SSL/TLS socket support" from Calle	2015-12-09 11:01:13 +02:00
Avi Kivity	204610ac61	Merge "Make LSA more large-allocation-friendly" from Paweł "This series attempts to make LSA more friendly for large (i.e. bigger than LSA segment) allocations. It is achieved by introducing segment zones – large, contiguous areas of segments and using them to allocate segments instead of calling malloc() directly. Zones can be shrunk when needed to reclaim memory and segments can be migrated either to reduce number of zone or to defragment one in order to be able to shrink it. LSA tries to keep all segments at the lower addresses and reclaims memory starting from the zones in the highest parts of the address space."	2015-12-09 10:49:23 +02:00
Avi Kivity	883074e936	Merge "Fix replace_node support" from Asias Also: [PATCH scylla v1 0/7] gossip mark node down fix + cleanup [PATCH scylla v1 0/2] Refuse decommissioned node to rejoin [PATCH scylla] storage_service: Fix added node not showing up in nodetool in status joining	2015-12-09 10:42:52 +02:00
Paweł Dziepak	8ba66bb75d	managed_bytes: fix copy size in move constructor Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-09 10:38:28 +02:00
Asias He	b63d49c773	storage_service: Log removing replaced endpoint from system.peers This info is important when replacing a node. Useful for debugging.	2015-12-09 12:30:52 +08:00
Asias He	d26c7e671d	storage_service: Enable commented out code in handle_state_normal Add current_owner to endpoints_to_remove if endpoint and current_owner have the same token and endpoint is newer than current_owner.	2015-12-09 12:30:52 +08:00
Asias He	3793bb7be1	token_metadata: Add get_endpoint_to_token_map_for_reading	2015-12-09 12:30:52 +08:00
Asias He	1cc7887ffb	token_metadata: Do nothing if tokens is empty. When replacing a node, we might ignore the tokens so that the tokens is empty. In this case, we will have std::unordered_map<inet_address, std::unordered_set<token>> = {ip, {}} passed to token_metadata::update_normal_tokens(std::unordered_map<inet_address, std::unordered_set<token>>& endpoint_tokens) and hit the assert assert(!tokens.empty());	2015-12-09 12:30:52 +08:00
Asias He	e79c85964f	system_keyspace: Flush system.peers in remove_endpoint 1) Start node 1, node 2, node 3 2) Stop node 3 3) Start node 4 to replace node 3 4) Kill node 4 (removal of node 3 in system.peers is not flushed to disk) 5) Start node 4 (will load node 3's token and host_id info in bootup) This makes "Token .* changing ownership from 127.0.0.3 to 127.0.0.4" messages printed again in step 5) which are not expected, which fails the dtest FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress) ---------------------------------------------------------------------- Traceback (most recent call last): File "scylla-dtest/replace_address_test.py", line 220, in replace_first_boot_test self.assertEqual(len(movedTokensList), numNodes) AssertionError: 512 != 256	2015-12-09 12:30:52 +08:00
Asias He	110a18987e	token_metadata: Print Token changing ownership from Needed by test.	2015-12-09 12:30:52 +08:00
Asias He	906f670a86	gossip: Print node status in handle_major_state_change It is useful to know the STATUS value when debugging.	2015-12-09 12:29:15 +08:00
Asias He	a0325a5528	gossip: Simplify is_shutdown and friends. Use the newly added helper get_gossip_status.	2015-12-09 12:29:15 +08:00
Asias He	9d4382c626	gossip: Introduce get_gossip_status Get value of application_state::STATUS.	2015-12-09 12:29:15 +08:00
Asias He	5a65d8bcdd	gossip: Fix endless marking a node down In commit `56df32ba56` (gossip: Mark node as dead even if already left). A node liveness check is missed. Fix it up. Before: (mark a node down multiple times) [Tue Dec 8 12:16:33 2015] INFO [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN [Tue Dec 8 12:16:33 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead [Tue Dec 8 12:16:34 2015] INFO [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN [Tue Dec 8 12:16:34 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead [Tue Dec 8 12:16:35 2015] INFO [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN [Tue Dec 8 12:16:35 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead [Tue Dec 8 12:16:36 2015] INFO [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN [Tue Dec 8 12:16:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead After: (mark a node down only one time) [Tue Dec 8 12:28:36 2015] INFO [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN [Tue Dec 8 12:28:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead	2015-12-09 12:29:15 +08:00
Asias He	fa3c84db10	gossip: Kill default constructor for versioned_value The only reason we needed it is to make _application_state[key] = value work. With the current default constructor, we increase the version number needlessly. To fix and to be safe, remove the default constructor completely.	2015-12-09 12:29:15 +08:00
Asias He	52a5e954f9	gossip: Pass const ref for versioned_value in on_change and before_change	2015-12-09 12:29:15 +08:00
Asias He	3308430343	storage_service: Make before_change and on_change log print more informative - Make before_change and on_change print the versioned_value - Print endpoint address first in handle_state_* and on_change and friends.	2015-12-09 12:29:15 +08:00
Asias He	ccbd801f40	storage_service: Fix decommissioned nodes are willing to rejoin the cluster if restarted Backport: CASSANDRA-8801 a53a6ce Decommissioned nodes will not rejoin the cluster. Tested with: topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test	2015-12-09 10:43:51 +08:00
Asias He	b3dd2d976a	storage_service: Simplify prepare_to_join with seastar thread	2015-12-09 10:43:51 +08:00
Asias He	e9a4d93d1b	storage_service: Fix added node not showing up in nodetool in status joining The get_token_endpoint API should return a map of tokens to endpoints, including the bootstrapping ones. Use get_local_storage_service().get_token_to_endpoint_map() for it. $ nodetool -p 7100 status Status=Up/Down \|/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 12645 256 ? eac5b6cf-5fda-4447-8104-a7bf3b773aba rack1 UN 127.0.0.2 12635 256 ? 2ad1b7df-c8ad-4cbc-b1f1-059121d2f0c7 rack1 UN 127.0.0.3 12624 256 ? 61f82ea7-637d-4083-acc9-567e0c01b490 rack1 UJ 127.0.0.4 ? 256 ? ced2725e-a5a4-4ac3-86de-e1c66cecfb8d rack1 Fixes #617	2015-12-09 10:43:51 +08:00
Paweł Dziepak	63bdf52803	tests/lsa: add large allocations test Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 23:56:46 +01:00
Tomasz Grabiec	d68a8b5349	Merge branch 'dev/amnon/index_summary_size_v2' from seastar-dev.git API for getting sstable index summary memory footprint from Amnon	2015-12-08 20:03:39 +01:00
Paweł Dziepak	73a1213160	scylla-gdb.py: print lsa zones Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	0d66300d43	lsa: add more counters Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	83b004b2fb	lsa: avoid fragmenting memory Originally, lsa allocated each segment independently what could result in high memory fragmentation. As a result many compaction and eviction passes may be needed to release a sufficiently big contiguous memory block. These problems are solved by introduction of segment zones, contiguous groups of segments. All segments are allocated from zones and the algorithm tries to keep the number of zones to a minimum. Moreover, segments can be migrated between zones or inside a zone in order to deal with fragmentation inside zone. Segment zones can be shrunk but cannot grow. Segment pool keeps a tree containing all zones ordered by their base addresses. This tree is used only by the memory reclamer. There is also a list of zones that have at least one free segments that is used during allocation. Segment allocation doesn't have any preferences which segment (and zone) to choose. Each zone contains a free list of unused segments. If there are no zones with free segments a new one is created. Segment reclamation migrates segments from the zones higher in memory to the ones at lower addresses. The remaining zones are shrunk until the requested number of segments is reclaimed. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	6c4a54fb0b	tests: add tests for utils::dynamic_bitset Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2fb14a10b6	utils: add dynamic_bitset A dynamic bitset implementation that provides functions to search for both set and cleared bits in both directions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	40dda261f2	lsa: maintain segment to region mapping Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	c4e71bac7f	tests/row_cache_alloc_stress: make sure that allocation fails Currently test case "Testing reading when memory can't be reclaimed." assumes that the allocation section used by row cache upon entering will require more free memory than there is available (inc. evictable). However, the reserves used by allocation section are adjusted dynamically and depend solely on previous events. In other words there is no guarantee that the reserve would be increased so much that the allocation will fail. The problem is solved by adding another allocation that is guaranteed to be bigger than all evictable and free memory. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2e94086a2c	lsa: use bi::list to implement segment_stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Tomasz Grabiec	6ead7a0ec5	Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git Scattering of blobs from Avi: This patchset converts the stack to scatter managed_bytes in lsa memory, allowing large blobs (and collections) to be stored in memtable and cache. Outside memtable/cache, they are still stored sequentially, but it is assumed that the number of transient objects is bounded. The approach taken here is to scatter managed_bytes data in multiple blob_storage objects, but to linearize them back when accessing (for example, to merge cells). This allows simple access through the normal bytes_view. It causes an extra two copies, but copying a megabyte twice is cheap compared to accessing a megabyte's worth of small cells, so per-byte throughput is increased. Testing show that lsa large object space is kept at zero, but throughput is bad because Scylla easily overwhelms the disk with large blobs; we'll need Glauber's throttling patches or a really fast disk to see good throughput with this.	2015-12-08 19:15:13 +01:00
Avi Kivity	5c5331d910	tests: test large blobs in memtables	2015-12-08 15:17:09 +02:00
Avi Kivity	0c2fba7e0b	lsa: advertize our preferred maximum allocation size Let managed_bytes know that allocating below a tenth of the segment size is the right thing to do.	2015-12-08 15:17:09 +02:00
Avi Kivity	f9e2a9a086	mutation_partition: work on linearized atomic_cell_or_mutation objects Ensure that when we examine atomic_cell_or_mutation objects for merging, that they are contiguous in memory. When we are done we scatter them again.	2015-12-08 15:17:09 +02:00
Avi Kivity	ad975ad629	atomic_cell_or_collection: linearize(), unlinearize() Add linearize() and unlinearize() methods that allow making an atomic_cell_or_collection object temporarily contiguous, so we can examine it as a bytes_view.	2015-12-08 15:17:09 +02:00
Avi Kivity	13324607e6	managed_bytes: conform to allocation_strategy's max_preferred_allocation_size Instead of allocating a single blob_storage, chain multiple blob_storage objects in a list, each limited not to exceed the allocation_strategy's max_preferred_allocation_size. This allows lsa to allocate each blob_storage object as an lsa managed object that can be migrated in memory. Also provide linearize()/scatter() methods that can be used to temporarily consolidate the storage into a single blob_storage. This makes the data contiguous, so we can use a regular bytes_view to examine it.	2015-12-08 15:17:08 +02:00
Takuya ASADA	8c98e239d0	dist: use /etc/scylla as SCYLLA_CONF directory on AMI We don't need copy /var/lib/scylla/conf on RAID anymore, it moved to /etc/scylla. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2015-12-08 11:09:12 +02:00
Avi Kivity	098136f4ab	Merge "Convert serialization of query::result to use db::serializer<>" from Tomasz Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2015-12-07 16:53:34 +02:00
Amnon Heiman	3ce7fa181c	API: Add the implementation for index_summary_off_heap_memory This adds the implementation for the index_summary_off_heap_memory for a single column family and for all of them. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 15:15:39 +02:00
Amnon Heiman	e786f1d02f	sstable: Add get_summary function The get_summary method returns a const reference to the summary object. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 14:52:18 +02:00
Amnon Heiman	bae286a5b4	Add memory_footprint method to summary_ka Similiar to origin, off heap memory, memory_footprint is the size of queus multiply by the structure size. memory_footprint is used by the API to report the memory that is taken by the summary. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 14:52:18 +02:00
Amnon Heiman	2086c651ba	column_family: get_snapshot_details should return empty map for no snapshots If there is no snapshot directory for the specific column family, get_snapshot_details should return an empty map. This patch check that a directory exists before trying to iterate over it. Fixes #619 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-07 12:51:04 +01:00
Tomasz Grabiec	b43b5af894	Merge tag 'tgrabiec/make-future-values-nothrow-move-constructible-v3' from seastar-dev.git Seastar's future<> now requires types to be nothrow move constructible. This series makes Scylla code comply.	2015-12-07 10:43:18 +01:00
Tomasz Grabiec	95f515a6bd	Move seastar submodule head Scylla changes: sstable.cc: Remove file_exists() function which conflicts with seastar's Amnon Heiman (2): reactor: Add file_exists method Add a wrapper for file_exists Avi Kivity (2): Merge "Introduce shared_future" from Tomasz Merge ""scripts: a few fixes in posix_net_conf.sh" from Vlad Gleb Natapov (3): rpc: not stop client in error state avoid allocation in parallel_for_each is there is nothing to do memory: fix size_to_idx calculation Nadav Har'El (1): test: fix use-after-free in timertest Pawe�� Dziepak (1): memory: use size instead of old_size to shrink memory block Tomasz Grabiec (7): file: Mark move constructor as noexcept core: future: Add static asserts about type's noexcept guarantees core: future: Drop now redundant move_noexcept flag core: future_state: Make state getters non-destructive for non-rvalue-refs core: future: Make get_available_state() noexcept core: Introduce shared_future Make json_return_type movable Vlad Zolotarov (8): scripts: posix_net_conf.sh: ban NIC IRQs from being moved by irqbalance scripts: posix_net_conf.sh: exclude CPU0 siblings from RPS scripts: posix_net_conf.sh: Configure XPS scripts: posix_net_conf.sh: Add a new mode for MQ NICs scripts: posix_net_conf.sh: increase some backlog sizes core: to_sstring(): cleanup core: to_sstring_strintf(): always use %g(or %lg) format for floating point values core: prevent explicit calls for to_sstring_sprintf()	2015-12-07 10:41:39 +01:00
Glauber Costa	79e70568d7	scylla-setup: do not add discard to the command line In a recent discussion with the XFS developers, Dave Chinner recommended us not to use discard, but rather issue fstrims explicitly. In machines like Amazon's c3-class, the situation is made worse by the fact that discard is not supported by the disk. Contrary to my intuition, adding the discard mount option in such situation is not a nop and will just create load for no reason. Signed-off-by: Glauber Costa <glommer@scylladb.com>	2015-12-07 11:22:27 +02:00
Tomasz Grabiec	934d3f06d1	api: Make histogram reduction work on domain value instead of json objects Objects extending json_base are not movable, so we won't be able to pass them via future<>, which will assert that types are nothrow move constructible. This problem only affects httpd::utils_json::histogram, which is used in map-reduce. This patch changes the aggregation to work on domain value (utils::ihistrogram) instead of json objects.	2015-12-07 09:50:28 +01:00

1 2 3 4 5 ...

7595 Commits