scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	17b56eb459	compaction: leveled: improve log message for overlapping table Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2dcbe3c8131f1d88a3536daa0b6cdd25c6e41d76.1464883077.git.raphaelsc@scylladb.com>	2016-06-05 18:20:01 +03:00
Raphael S. Carvalho	588ce915d6	compaction: disable parallel compaction for leveled strategy It was discussed that leveled strategy may not benefit from parallel compaction feature because almost all compaction jobs will have similar size. It was also found that leveled strategy wasn't working correctly with it because two overlapping sstable (targetting the same level) could be created in parallel by two ongoing compaction. Fixes #1293. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>	2016-06-05 18:20:00 +03:00
Amnon Heiman	5f84e55bf6	histogram: total need to be increment on plus operator The total counter (the one that count the actual number of sample points) should be incremented when adding histograms. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1464172277-4251-1-git-send-email-amnon@scylladb.com>	2016-06-05 12:09:36 +03:00
Tomasz Grabiec	57413618e8	Merge branch 'range-tombstone-v9' from https://github.com/duarten/scylla.git From Duarte: This patchset adds the range_tombstone_list data structure, used to hold a set of disjoint range tombstones, and changes the internal representation of row tombstones to use that data structure. Fixes #1155 [tgrabiec: Added compound_wrapper::make_empty(const schema&) overload to fix compilation failure in tracing code]	2016-06-02 22:17:17 +02:00
Raphael S. Carvalho	3f4500cb71	db: compaction strategy changes via alter table must have immediate effect At the moment, compaction strategy changes via ALTER TABLE have no effect until node restart. Tomek says: "Statements of the following form should have immediate effect: ALTER TABLE t WITH compaction = { 'class' : 'LeveledCompactionStrategy' };" Fixes #877. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3b72c494f887643b82a272ef0a9995edb970382c.1464726828.git.raphaelsc@scylladb.com>	2016-06-02 16:59:50 +02:00
Pekka Enberg	d03f65d94e	database: Don't use std::cbegin() and std::cend() They're not supported by GCC 4.9. Fixes #1305 Message-Id: <1464877984-27856-1-git-send-email-penberg@scylladb.com>	2016-06-02 16:57:24 +02:00
Duarte Nunes	c970d682d1	storage_service: Announce range tombstones feature This patch enables the RANGE_TOMBSTONES supported feature, meaning that the node is capable of accepting row entry tombstones as range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	70083efee2	sstables: Read and write range tombstone bounds This patch uses the composite_marker to add inclusiveness information to the prefixes of a range tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	7628e403a3	sstables: Drop code for tombstone merging Since Scylla now supports proper range tombstones, the code for reading ranges from sstables and converting them to overlapping tombstones is no longer necessary, and is, in fact, wasteful as the internal representation converts overlapping tombstones back to ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	79bff2742f	random_mutation_generator: Generate range tombstones Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	95594b8171	mutations: Encapsulate row tombstones difference This patch moves the difference between two mutation_partition's row_tombstones inside the range_tombstone_list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Duarte Nunes	e46537b7d3	storage_service: Include range tombstones feature This patch adds the range tombstones feature, which is not enabled yet, to the storage_service, so that consumers can query for it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	17a544c4a6	gossip: Add feature default ctor and operator= This allows a feature to be declared and initialized later. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	2c82dcd309	gossip: Decouple feature lifetime from the gossiper This patch changes the gms::feature destructor so it checks whether the gossiper has been stopped before trying to unregister the feature. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	351aaf9738	range_tombstone: Introduce range_tombstone_to_prefix_tombstone_converter This patch extracts the code from sstables/partition.cc which is used to transform a set of range tombstones into a set of overlapping scylladb tombstones. The range_tombstone_merger will be used to send mutations to nodes not yet updated to support the internal range tombstone representation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	f7809bcaef	range_tombstone_list: Add unit test Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	284bb6b66f	range_tombstone_list: Make it ReversiblyMergeable This patch implements the ReversiblyMergeable cancellative monoid for the range_tombstone_list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	86030885c8	mutations: Introduce range tombstone list This class is responsible for representing a set of range tombstones as non-overlapping disjoint sets of range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	6a111fdd01	mutations: Introduce the range_tombstone class This patch introduces the range_tombstone class, composed of a [start, end] pair of clustering_key_prefixes, the type of inclusiveness of each bound, and a tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	dc8319ed91	keys: Remove schema argument from make_empty An empty key is independent of the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	7f8c35dd8c	idl: Add range tombstone IDL This patch adds the range tombstone IDL, preserving backwards compatibility. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	9bd7d08fc7	idl-compiler: Default expr can refer to previous fields This patch changes the idl-compiler so that the default value of a field can be set to the value of a previous field in the class: class P { uint32_t x; uint32_t y = x; }; Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	e2812c1b7a	idl: Rename range_tombstone::key to start ... and make it a clustering_key_prefix, in preparation of supporting not-whole-row range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Pekka Enberg	f64c25a495	cql3/statements/select_statement: Unify coding style The coding style in select_statement.cc is very inconsistent which makes the code hard to read. Clean that up. Message-Id: <1464871790-21031-1-git-send-email-penberg@scylladb.com>	2016-06-02 16:17:21 +02:00
Avi Kivity	6da0449fc7	tests: adjust config_test for db::string_map changes	2016-06-02 14:48:02 +03:00
Gleb Natapov	9132604a90	config: make string_map to be a unique type instead of an alias to unordered_map Config provides operators << >> for string_map which makes it impossible to have generic stream operators for unordered_map. Fix it by making string_map a separate type and not just an alias. Message-Id: <20160602102642.GJ9939@scylladb.com>	2016-06-02 13:28:40 +03:00
Asias He	96463cc17c	streaming: Fix indention in do_send_mutations Message-Id: <bc8cfa7c7b29f08e70c0af6d2fb835124d0831ac.1464857352.git.asias@scylladb.com>	2016-06-02 11:56:03 +03:00
Asias He	206955e47c	streaming: Reduce memory usage when sending mutations Limit disk bandwidth to 5MB/s to emulate a slow disk: echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.write_bps_device echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.read_bps_device Start scylla node 1 with low memory: scylla -c 1 -m 128M --auto-bootstrap false Run c-s: taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema 'replication(factor=1)' -pop seq=1..100000 -rate threads=20 limit=2000/s -node 127.0.0.1 Start scylla node 2 with low memory: scylla -c 1 -m 128M --auto-bootstrap true Without this patch, I saw std::bad_alloc during streaming ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc) ... ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move memtable to cache: std::bad_alloc (std::bad_alloc) ... To fix: 1. Apply the streaming mutation limiter before we read the mutation into memory to avoid wasting memory holding the mutation which we can not send. 2. Reduce the parallelism of sending streaming mutations. Before we send each range in parallel, after we send each range one by one. before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage) after: nr_shard * (send_info + cf.make_reader memory usage) We can at least save memory usage by the factor of nr_vnode, 256 by default. In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw no std::bad_alloc. Also, I did not see streaming bandwidth dropped due to 2). In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4, as described: https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375 With this patch, I saw no std::bad_alloc any more. Fixes: #1270 Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>	2016-06-02 11:01:58 +03:00
Gleb Natapov	1476becd28	config: put operators << and >> into db namespace Makes ADL find the right version of the overload. Message-Id: <20160601130952.GJ2381@scylladb.com>	2016-06-02 10:45:01 +03:00
Pekka Enberg	b6b2c84316	Merge "CQL tracing" from Vlad "This series introduces a tracing infrastructure that may be used for tracing CQL commands execution and measuring latencies of separate stages of CQL handling as defined by a CQL binary protocol specification. To begin tracing one should create a "tracing session", which may then be used to issuing tracing events. If execution of a specific CQL command involves other Nodes (not only a Coordinator), then a "tracing session ID" is passed to that Node (in the context of the corresponding RPC call). Then this "session ID" may be used to create a "secondary tracing session" to issue tracing events in the context of the original session. The series contains an implementation of tracing that uses a keyspace in the current cluster for storing tracing information. This series contains a demo per-request tracing instrumentation of a QUERY CQL command and even this instrumentation is partial: it only fully instruments a QUERY->SELECT->read_data call chain. This is by all means a very beginning of the proper instrumentation which is to come. Right now the latencies for a single SELECT for a single raw with RF 1 from a 2 Nodes cluster on my laptop started using ccm (for C* all default parameters, for scylla - memory 256MB, --smp 2) are as follows (pseudo-graphics warning): -------------------------------------------------------------------------------------------- \| scylla (2 Nodes x 2 shards each) \| C* 2.1.8 _______________________________________\|___________________________________\|________________ Coordinator and replica are same Node \| \| (TRACING OFF): \| 0.3ms \| 0.3ms c-s with a single thread mean latency \| (was 0.2ms before the last \| value \| rebase with a master) \| -------------------------------------------------------------------------------------------- Coordinator and replica are same Node \| \| (TRACING ON) \| ~250us \| ~1200us Running a SELECT command from a cqlsh \| \| a few times \| \| -------------------------------------------------------------------------------------------- Coordinator and replica are not on the \| \| same Node \| ~700us \| >2500us (TRACING ON) \| \| -------------------------------------------------------------------------------------------- To begin tracing one may use a cqlsh "TRACING ON/OFF" commands: cqlsh> TRACING ON Now Tracing is enabled cqlsh> select "C0", "C1" from keyspace1.standard1 where key=0x12345679; C0 \| C1 --------------------+------ 0x000000000001e240 \| null (1 rows) Tracing session: 146f0180-21e7-11e6-b244-000000000000 activity \| timestamp \| source \| source_elapsed -------------------------------------------------------------------+----------------------------+-----------+---------------- select "C0", "C1" from keyspace1.standard1 where key=0x12345679; \| 2016-05-24 22:38:24.536000 \| 127.0.0.1 \| 0 message received from /127.0.0.1 [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.2 \| -- Done reading options [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 3 read_data handling is done [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.2 \| 37 Parsing a statement [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 3 Processing a statement [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 56 Done processing - preparing a result [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 550 Request complete \| 2016-05-24 22:38:24.536560 \| 127.0.0.1 \| 560 cqlsh>"	2016-06-02 08:35:33 +03:00
Avi Kivity	c7953897d1	build: remove obsolete log.cc dependency	2016-06-01 22:35:07 +03:00
Vlad Zolotarov	69bd8efc40	storage_proxy: instrument a read_data handler to accept a tracing info This is a demo instrumentation: - Check if a tracing info is present in the read_command. - If yes - create a tracing session with the given tracing session ID. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Vlad Zolotarov	4c17a422e0	cql3: instrument a SELECT query to send tracing info Instrument a coordinator of a SELECT query to send tracing session info to the corresponding replica Nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Vlad Zolotarov	6e26909b02	query::read_command: add an optional trace_info field Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:19 +03:00
Vlad Zolotarov	a53d329b25	tracing: add a serializable trace_info object tracing::trace_info is used to pass the tracing information between nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:16:53 +03:00
Vlad Zolotarov	099ff0d2d5	transport: instrument a QUERY with tracing - Store a trace state inside a client_state. - Start tracing in a cql_server::connection::process_query(). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:14:29 +03:00
Vlad Zolotarov	f994e0a8d0	transport/server: add support for sending a tracing session ID in a CQL response - Add a tracing ID (UUID) optional field to cql_server::response. - If _tracing_id is set make_frame() would insert a tracing ID in the response message. According to CQL spec it should be the first thing in the response "body" and the TRACING bit (0x02) should be set in the "flags" field. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	9e61a3498d	cql_server::response: rework make_frame() Use a template function to avoid code duplication. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	8bf34fca02	service::client_state: store a client address When client_state is created with an external_tag - store a client address in the client state. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	c58c56bccc	gms::inet_address: add a constructor from socket_address Currently only IPv4 addresses are supported. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	63c724c41d	service::client_state: make private fields actually private Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	4b43b08ffc	main: start a tracing service Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	c965528a03	tracing: add a trace_state and tracing classes trace_state: Is a single tracing session. tracing: A sharded service that contains an i_trace_backend_helper instance and is a "factory" of trace_state objects. trace_state main interface functions are: - begin(): Start time counting (should be used via tracing::begin() wrapper). - trace(): Create a tracing event - it's coupled with a time passed since begin() (should be used via tracing::trace() wrapper). - ~trace_state(): Destructor will close the tracing session. "tracing" service main interface function is: - start(): Initialize a backend. - stop(): Shut down a backend. - create_session(): Creates a new tracing session. (tracing::end_session(): Is called by a trace_state destructor). When trace_state needs to store a tracing event it uses a backend helper from a "tracing" service. A "tracing" service limits a number of opened tracing session by a static number. If this number is reached - next sessions will be dropped. trace_state implements a similar strategy in regard to tracing events per singe session. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:42 +03:00
Vlad Zolotarov	fa14ad3a99	service/client_state: don't allow modification of a system_trace KS Only users with enough permissions are allowed to modify system_trace KS. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:19 +03:00
Vlad Zolotarov	d3988a8113	tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation Uses a CQL keyspace system_traces to store tracing information. Uses two tables: CREATE TABLE system_traces.sessions ( session_id uuid, command text, client inet, coordinator inet, duration int, parameters map<text, text>, request text, started_at timestamp, PRIMARY KEY ((session_id))) and CREATE TABLE system_traces.events ( session_id uuid, event_id timeuuid, activity text, source inet, source_elapsed int, thread text, PRIMARY KEY ((session_id), event_id)) system_traces.sessions table contains records of tracing sessions. system_traces.sessions columns description: - session_id: an ID of the session. - command: type of a command this session was created for (currently supported "NONE", "QUERY" and "REPAIR"). - client: IP of the client that issued the command. - coordinator: IP of a coordinator that received the command. - duration: total duration of the tracing session (in us). - parameters: optional parameters for this session, passed to i_trace_state::begin() call. - request: a CQL command this tracing session is created for. - started_at: the time the session has been started at. system_traces.events contains records of separate tracing events. system_traces.events columns description: - session_id: an ID of the session. - event_id: an ID of the event. - activity: the trace point description - a message given to i_trace_state::trace(). - source: IP of the Node where trace event was issued. - source_elapsed: time passed since creation of a tracing session (in us) on the Node where this trace event was issued. - thread: name of the thread in who's context this trace event was issued in (currently its "core N", where 'N' is an index of a shard the trace event was issued on). This class will cache lambdas creating the corresponding mutations for each tracing record requested to be stored till flush() method is called. flush() will merge all pending mutations to "sessions" and "events" tables and then apply a mutation to "events" table and when it completes - to "sessions" table. This way it'll ensure that when some tracing session is visible, all its events are visible too. trace_keyspace_helper exposes a few metrics via collectd: - tracing_error - a total number of errors (not including OOM) - bad_column_family_errors - number of times a tracing record wasn't stored because system_trace tables' schema didn't match the expected value. This may happen if a DB administrator is doing funny things like altering the schemas of the above tables. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:19 +03:00
Vlad Zolotarov	a2994ffd7f	tracing: add i_tracing_backend_helper interface This class represents an interface for a specific backend that is going to store tracing information. The specific implementation may and expected to implement caching of pending tracing records. Interface functions are: - start(): Initialize a backend (e.g. create keyspace and tables). - stop(): Flush all pending work and shut down the backend. - store_session_record()/store_event_record(): Cache/store the corresponding tracing records. - flush(): Flush pending tracing records. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:13 +03:00
Gleb Natapov	91c773fdde	storage_proxy: fix writes_attempts counter writes_attempts suppose to count how many time data was sent out, but currently it counts even those replicas in other DCs that get the data through a coordinator. Fix it by counting only when data is actually sent. Message-Id: <20160601153124.GB9939@scylladb.com>	2016-06-01 18:46:23 +03:00
Avi Kivity	8dcbddc7ed	Merge "Serialize memtable flushes" from Glauber "One of the things we need to do as part of the throttle rework I am doing is to serialize memtable flushes to some extent - that will guarantee that in case we're throttling, the flushes finish earlier and release memory earlier, if compared to the case in which we just let all tables flush freely and simultaneously."	2016-06-01 18:31:18 +03:00
Avi Kivity	0c7b2e2d5c	Merge	2016-06-01 18:29:23 +03:00

1 2 3 4 5 ...

9488 Commits