scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Duarte Nunes	9bd7d08fc7	idl-compiler: Default expr can refer to previous fields This patch changes the idl-compiler so that the default value of a field can be set to the value of a previous field in the class: class P { uint32_t x; uint32_t y = x; }; Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	e2812c1b7a	idl: Rename range_tombstone::key to start ... and make it a clustering_key_prefix, in preparation of supporting not-whole-row range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Pekka Enberg	f64c25a495	cql3/statements/select_statement: Unify coding style The coding style in select_statement.cc is very inconsistent which makes the code hard to read. Clean that up. Message-Id: <1464871790-21031-1-git-send-email-penberg@scylladb.com>	2016-06-02 16:17:21 +02:00
Avi Kivity	6da0449fc7	tests: adjust config_test for db::string_map changes	2016-06-02 14:48:02 +03:00
Gleb Natapov	9132604a90	config: make string_map to be a unique type instead of an alias to unordered_map Config provides operators << >> for string_map which makes it impossible to have generic stream operators for unordered_map. Fix it by making string_map a separate type and not just an alias. Message-Id: <20160602102642.GJ9939@scylladb.com>	2016-06-02 13:28:40 +03:00
Asias He	96463cc17c	streaming: Fix indention in do_send_mutations Message-Id: <bc8cfa7c7b29f08e70c0af6d2fb835124d0831ac.1464857352.git.asias@scylladb.com>	2016-06-02 11:56:03 +03:00
Asias He	206955e47c	streaming: Reduce memory usage when sending mutations Limit disk bandwidth to 5MB/s to emulate a slow disk: echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.write_bps_device echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.read_bps_device Start scylla node 1 with low memory: scylla -c 1 -m 128M --auto-bootstrap false Run c-s: taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema 'replication(factor=1)' -pop seq=1..100000 -rate threads=20 limit=2000/s -node 127.0.0.1 Start scylla node 2 with low memory: scylla -c 1 -m 128M --auto-bootstrap true Without this patch, I saw std::bad_alloc during streaming ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc) ... ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move memtable to cache: std::bad_alloc (std::bad_alloc) ... To fix: 1. Apply the streaming mutation limiter before we read the mutation into memory to avoid wasting memory holding the mutation which we can not send. 2. Reduce the parallelism of sending streaming mutations. Before we send each range in parallel, after we send each range one by one. before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage) after: nr_shard * (send_info + cf.make_reader memory usage) We can at least save memory usage by the factor of nr_vnode, 256 by default. In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw no std::bad_alloc. Also, I did not see streaming bandwidth dropped due to 2). In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4, as described: https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375 With this patch, I saw no std::bad_alloc any more. Fixes: #1270 Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>	2016-06-02 11:01:58 +03:00
Gleb Natapov	1476becd28	config: put operators << and >> into db namespace Makes ADL find the right version of the overload. Message-Id: <20160601130952.GJ2381@scylladb.com>	2016-06-02 10:45:01 +03:00
Pekka Enberg	b6b2c84316	Merge "CQL tracing" from Vlad "This series introduces a tracing infrastructure that may be used for tracing CQL commands execution and measuring latencies of separate stages of CQL handling as defined by a CQL binary protocol specification. To begin tracing one should create a "tracing session", which may then be used to issuing tracing events. If execution of a specific CQL command involves other Nodes (not only a Coordinator), then a "tracing session ID" is passed to that Node (in the context of the corresponding RPC call). Then this "session ID" may be used to create a "secondary tracing session" to issue tracing events in the context of the original session. The series contains an implementation of tracing that uses a keyspace in the current cluster for storing tracing information. This series contains a demo per-request tracing instrumentation of a QUERY CQL command and even this instrumentation is partial: it only fully instruments a QUERY->SELECT->read_data call chain. This is by all means a very beginning of the proper instrumentation which is to come. Right now the latencies for a single SELECT for a single raw with RF 1 from a 2 Nodes cluster on my laptop started using ccm (for C* all default parameters, for scylla - memory 256MB, --smp 2) are as follows (pseudo-graphics warning): -------------------------------------------------------------------------------------------- \| scylla (2 Nodes x 2 shards each) \| C* 2.1.8 _______________________________________\|___________________________________\|________________ Coordinator and replica are same Node \| \| (TRACING OFF): \| 0.3ms \| 0.3ms c-s with a single thread mean latency \| (was 0.2ms before the last \| value \| rebase with a master) \| -------------------------------------------------------------------------------------------- Coordinator and replica are same Node \| \| (TRACING ON) \| ~250us \| ~1200us Running a SELECT command from a cqlsh \| \| a few times \| \| -------------------------------------------------------------------------------------------- Coordinator and replica are not on the \| \| same Node \| ~700us \| >2500us (TRACING ON) \| \| -------------------------------------------------------------------------------------------- To begin tracing one may use a cqlsh "TRACING ON/OFF" commands: cqlsh> TRACING ON Now Tracing is enabled cqlsh> select "C0", "C1" from keyspace1.standard1 where key=0x12345679; C0 \| C1 --------------------+------ 0x000000000001e240 \| null (1 rows) Tracing session: 146f0180-21e7-11e6-b244-000000000000 activity \| timestamp \| source \| source_elapsed -------------------------------------------------------------------+----------------------------+-----------+---------------- select "C0", "C1" from keyspace1.standard1 where key=0x12345679; \| 2016-05-24 22:38:24.536000 \| 127.0.0.1 \| 0 message received from /127.0.0.1 [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.2 \| -- Done reading options [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 3 read_data handling is done [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.2 \| 37 Parsing a statement [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 3 Processing a statement [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 56 Done processing - preparing a result [0] \| 2016-05-24 22:38:24.537000 \| 127.0.0.1 \| 550 Request complete \| 2016-05-24 22:38:24.536560 \| 127.0.0.1 \| 560 cqlsh>"	2016-06-02 08:35:33 +03:00
Avi Kivity	c7953897d1	build: remove obsolete log.cc dependency	2016-06-01 22:35:07 +03:00
Vlad Zolotarov	69bd8efc40	storage_proxy: instrument a read_data handler to accept a tracing info This is a demo instrumentation: - Check if a tracing info is present in the read_command. - If yes - create a tracing session with the given tracing session ID. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Vlad Zolotarov	4c17a422e0	cql3: instrument a SELECT query to send tracing info Instrument a coordinator of a SELECT query to send tracing session info to the corresponding replica Nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Vlad Zolotarov	6e26909b02	query::read_command: add an optional trace_info field Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:19 +03:00
Vlad Zolotarov	a53d329b25	tracing: add a serializable trace_info object tracing::trace_info is used to pass the tracing information between nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:16:53 +03:00
Vlad Zolotarov	099ff0d2d5	transport: instrument a QUERY with tracing - Store a trace state inside a client_state. - Start tracing in a cql_server::connection::process_query(). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:14:29 +03:00
Vlad Zolotarov	f994e0a8d0	transport/server: add support for sending a tracing session ID in a CQL response - Add a tracing ID (UUID) optional field to cql_server::response. - If _tracing_id is set make_frame() would insert a tracing ID in the response message. According to CQL spec it should be the first thing in the response "body" and the TRACING bit (0x02) should be set in the "flags" field. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	9e61a3498d	cql_server::response: rework make_frame() Use a template function to avoid code duplication. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	8bf34fca02	service::client_state: store a client address When client_state is created with an external_tag - store a client address in the client state. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	c58c56bccc	gms::inet_address: add a constructor from socket_address Currently only IPv4 addresses are supported. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	63c724c41d	service::client_state: make private fields actually private Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	4b43b08ffc	main: start a tracing service Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Vlad Zolotarov	c965528a03	tracing: add a trace_state and tracing classes trace_state: Is a single tracing session. tracing: A sharded service that contains an i_trace_backend_helper instance and is a "factory" of trace_state objects. trace_state main interface functions are: - begin(): Start time counting (should be used via tracing::begin() wrapper). - trace(): Create a tracing event - it's coupled with a time passed since begin() (should be used via tracing::trace() wrapper). - ~trace_state(): Destructor will close the tracing session. "tracing" service main interface function is: - start(): Initialize a backend. - stop(): Shut down a backend. - create_session(): Creates a new tracing session. (tracing::end_session(): Is called by a trace_state destructor). When trace_state needs to store a tracing event it uses a backend helper from a "tracing" service. A "tracing" service limits a number of opened tracing session by a static number. If this number is reached - next sessions will be dropped. trace_state implements a similar strategy in regard to tracing events per singe session. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:42 +03:00
Vlad Zolotarov	fa14ad3a99	service/client_state: don't allow modification of a system_trace KS Only users with enough permissions are allowed to modify system_trace KS. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:19 +03:00
Vlad Zolotarov	d3988a8113	tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation Uses a CQL keyspace system_traces to store tracing information. Uses two tables: CREATE TABLE system_traces.sessions ( session_id uuid, command text, client inet, coordinator inet, duration int, parameters map<text, text>, request text, started_at timestamp, PRIMARY KEY ((session_id))) and CREATE TABLE system_traces.events ( session_id uuid, event_id timeuuid, activity text, source inet, source_elapsed int, thread text, PRIMARY KEY ((session_id), event_id)) system_traces.sessions table contains records of tracing sessions. system_traces.sessions columns description: - session_id: an ID of the session. - command: type of a command this session was created for (currently supported "NONE", "QUERY" and "REPAIR"). - client: IP of the client that issued the command. - coordinator: IP of a coordinator that received the command. - duration: total duration of the tracing session (in us). - parameters: optional parameters for this session, passed to i_trace_state::begin() call. - request: a CQL command this tracing session is created for. - started_at: the time the session has been started at. system_traces.events contains records of separate tracing events. system_traces.events columns description: - session_id: an ID of the session. - event_id: an ID of the event. - activity: the trace point description - a message given to i_trace_state::trace(). - source: IP of the Node where trace event was issued. - source_elapsed: time passed since creation of a tracing session (in us) on the Node where this trace event was issued. - thread: name of the thread in who's context this trace event was issued in (currently its "core N", where 'N' is an index of a shard the trace event was issued on). This class will cache lambdas creating the corresponding mutations for each tracing record requested to be stored till flush() method is called. flush() will merge all pending mutations to "sessions" and "events" tables and then apply a mutation to "events" table and when it completes - to "sessions" table. This way it'll ensure that when some tracing session is visible, all its events are visible too. trace_keyspace_helper exposes a few metrics via collectd: - tracing_error - a total number of errors (not including OOM) - bad_column_family_errors - number of times a tracing record wasn't stored because system_trace tables' schema didn't match the expected value. This may happen if a DB administrator is doing funny things like altering the schemas of the above tables. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:19 +03:00
Vlad Zolotarov	a2994ffd7f	tracing: add i_tracing_backend_helper interface This class represents an interface for a specific backend that is going to store tracing information. The specific implementation may and expected to implement caching of pending tracing records. Interface functions are: - start(): Initialize a backend (e.g. create keyspace and tables). - stop(): Flush all pending work and shut down the backend. - store_session_record()/store_event_record(): Cache/store the corresponding tracing records. - flush(): Flush pending tracing records. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:13 +03:00
Gleb Natapov	91c773fdde	storage_proxy: fix writes_attempts counter writes_attempts suppose to count how many time data was sent out, but currently it counts even those replicas in other DCs that get the data through a coordinator. Fix it by counting only when data is actually sent. Message-Id: <20160601153124.GB9939@scylladb.com>	2016-06-01 18:46:23 +03:00
Avi Kivity	8dcbddc7ed	Merge "Serialize memtable flushes" from Glauber "One of the things we need to do as part of the throttle rework I am doing is to serialize memtable flushes to some extent - that will guarantee that in case we're throttling, the flushes finish earlier and release memory earlier, if compared to the case in which we just let all tables flush freely and simultaneously."	2016-06-01 18:31:18 +03:00
Avi Kivity	0c7b2e2d5c	Merge	2016-06-01 18:29:23 +03:00
Avi Kivity	d2e4548b35	Merge seastar upstream * seastar 0bcdd28...864d6dc (4): > Logging framework > Add libubsan and libasan to fedora deps docs > tests: add rpc cancellable tests > rpc: add cancellable interface Dropped logging implementation in favor of seastar's due to a link conflict with operator<<.	2016-06-01 18:28:42 +03:00
Tomasz Grabiec	56736389c1	Merge branch 'sstable-errors/v2' from https://github.com/penberg/scylla.git This series adds a constructor to malformed_sstable_exception that includes a filename and converts some call-sites to use it. There are still plenty of low-level sites that don't even know the sstable filename they are operating on. We need to either change the code to carry the filename to lower layers or find a higher-level call-site where we can catch malformed_sstable_exception and rethrow it with the sstable filename. But that's for another series by someone who knows the sstable code well. Refs #669.	2016-06-01 16:59:56 +02:00
Gleb Natapov	26b50eb8f4	storage_proxy: drop debug output Message-Id: <20160601132641.GK2381@scylladb.com>	2016-06-01 17:13:12 +03:00
Pekka Enberg	94c35cc135	sstables/sstables: Add sstable filename to thrown malformed_sstable_exceptions	2016-06-01 17:11:05 +03:00
Pekka Enberg	3ca7fc2a8b	database: Add sstable filename to thrown malformed_sstable_exceptions	2016-06-01 14:56:10 +03:00
Pekka Enberg	fa5354dda4	sstables: Add optional filename to malformed_sstable_exception Add a constructor to malformed_sstable_exception that accepts a error message and a sstable name.	2016-06-01 14:48:08 +03:00
Pekka Enberg	de0634c289	Merge "Extract modification_statement's (and related) parsed statement into raw" from Avi "Move parsed statements into raw namespace. Mindless but therapeutic."	2016-06-01 14:19:53 +03:00
Avi Kivity	92d815a6cf	Make github issue template less shouty	2016-06-01 10:45:04 +03:00
Pekka Enberg	0255318bf3	Revert "Revert "main: change order between storage service and drain execution during exit"" This reverts commit `b3ed55be1d`. The issue is in the failing dtest, not this commit. Gleb writes: "The bug is in the test, not the patch. Test waits for repair session to end one way or the other when node is killed, but for nodetool to know if repair is completed it needs to poll for it. If node dies before nodetool managed to see repair completion it will stuck forever since jmx is alive, but does not provide answers any more. The patch changes timing, repair is completed much close to exit now, so problem appears, but it may happen even without the patch. The fix is for dtest to kill jmx as part of killing a node operation." Now that Lucas fixed the problem in scylla-ccm, revert the revert.	2016-06-01 08:48:50 +03:00
Glauber Costa	0f64eb7e7d	serialize memtable flush for a memtable_list We can only free memory for a region_group when the entire memtable is released. This means that while the disk can handle requests from multiple memtables just fine, we won't free any memory until all of them finish. If we are under a pressure situation we will take a lot more time to leave it. Ideally, with write-behind, we would allow just one memtable to be flushed at a time. But since we don't have it enabled, it's better to serialize the flushes so that only some memtables (4) are flushed at a time. Having the memtable writer bandwidth all to itself, the memtable will finish sooner, release memory sooner, and recover the system's health sooner. We would like to do that without having streaming and memtables starve each other. Ideally, that should mean half the bandwidth for each - but that sacrifices memtable writes in the common case there is no streaming. Again, write behind will help here, and since this is something we intend to do, there is no need to complicate the code too much for an interim solution. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-05-31 17:18:35 -04:00
Glauber Costa	46c79be401	database: allow callers to specify memtable list's flush behavior This patch introduces an explicit behavior enum class - one of delayed or immediate, that allow callers to tell the memtable list whether they want a delayed flush (default), or force an immediate flush. So far this only affects the streaming code (memtables just ignore it), but the concept is one that can be easily generalized. With that in place, we can revert back the stop function to use the standard flush. I have argued before that adding infrastructure like that would not be worth it for the sake of stop alone, but some other code could now use it. Specifically, the active reclaimer for the throttler would like to force immediate flushes, as delayed flushes really won't make a lot of difference in reducing memory usage. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-05-31 17:17:48 -04:00
Avi Kivity	c8b5104aa5	cql3: extract raw batch_statement into raw sub-namespace prepare() was moved to .cc to avoid circular dependencies.	2016-05-31 21:41:26 +03:00
Avi Kivity	1d144699f6	cql3: extract raw delete_statement into raw sub-namespace	2016-05-31 21:24:56 +03:00
Avi Kivity	e596799962	cql3: extract raw update_statement into raw sub-namespace update_statment also has an insert_statement counterpart, convert it too.	2016-05-31 21:16:53 +03:00
Avi Kivity	10213c4211	cql3: extract raw modification_statement into raw sub-namespace	2016-05-31 20:53:37 +03:00
Asias He	f27e5d2a68	messaging_service: Delay listening ms during boot up When a node starts up, peer node can send gossip syn message to it before the gossip message handlers are registered in messaging_service. We can see: scylla[123]: [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored To fix, we delay the listening of messaging_service to the point when gossip message handlers are registered. Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>	2016-05-31 12:28:11 +03:00
Avi Kivity	f3fc3afe00	cql3: optimize make_empty_metadata() All empty metadata objects are equal, so make just one and keep returning it. Message-Id: <1464334638-7971-4-git-send-email-avi@scylladb.com>	2016-05-31 09:12:20 +03:00
Avi Kivity	0135b4d5cd	cql3: constify metadata users Metadata usually doesn't change after it is created; make that visible in the code, allowing further optimizations to be applied later. Message-Id: <1464334638-7971-3-git-send-email-avi@scylladb.com>	2016-05-31 09:12:11 +03:00
Avi Kivity	6728454591	cql3: rationalize extract_result_metadata() Rather than dynamic_cast<>ing the statement to see whether it is a select statement, add a virtual function to cql_statement to get the result metadata. This is faster and easier to follow. Message-Id: <1464334638-7971-2-git-send-email-avi@scylladb.com>	2016-05-31 09:12:02 +03:00
Avi Kivity	25b3d74f45	cql3: Split select_statement::raw_statement into raw namespace cql3::select_statement::raw_statement -> cql3::raw::select_statement Message-Id: <1464609556-3756-4-git-send-email-avi@scylladb.com>	2016-05-31 09:09:30 +03:00
Avi Kivity	c8f98c5981	cql3: move cf_statement into raw hierarchy cql3::statements::cf_statement -> cql3::statements::raw::cf_statement Message-Id: <1464609556-3756-3-git-send-email-avi@scylladb.com>	2016-05-31 09:09:21 +03:00
Avi Kivity	caf8d4f0e6	cql3: separate parsed_statement and parsed_statment::prepared cql3::statements::parsed_statement -> cql3::statements::raw::parsed_statement cql3::statements::parsed_statement::prepared -> cql3::statements::prepared_statement Message-Id: <1464609556-3756-2-git-send-email-avi@scylladb.com>	2016-05-31 09:09:10 +03:00

... 44 45 46 47 48 ...

11716 Commits