Commit Graph

655 Commits

Author SHA1 Message Date
Duarte Nunes
9ffdf4a5cd db: Implement size_estimates_recorder
This patch implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. The size_estimates_recorder class
corresponds to the one in Cassandra's SizeEstimatesRecorder.java.

Estimation is carried out by shard 0. Since we're estimating based on
data in shared sstables, having multiple shards doing this would skew
the results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-19 09:44:58 +00:00
Duarte Nunes
f81329be60 sstables: sstables::key delegates to composite
The sstables::key class now delegates much of its functionality
to the composite class. All existing behavior is preserved.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 23:37:33 +02:00
Duarte Nunes
1ffae6e6ee database_test: Add test case for row limit
This patch introduces database_test and adds a test case to ensure
the row limit is respected when querying multiple partition ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623111723.17523-1-duarte@scylladb.com>
2016-06-23 14:20:34 +02:00
Paweł Dziepak
db5ea591ad add mvcc implementation for mutation_partitions
To ensure isolation of operation when streaming a mutation from a
mutable source (such as cache or memtable) MVCC is used.

Each entry in memtable or cache is actually a list of used versions of
that entry. Incoming writes are either applied directly to the last
verion (if it wasn't being read by anyone) or preprended to the list
(if the former head was being read by someone). When reader finishes it
tries to squash versions together provided there is no other reader that
could prevent this.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
4992ea9949 tests: add test for anchorless_list
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
c8f4b96e76 tests: add streamed_mutation_tests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
262337768a streamed_mutation: introduce mutation_fragment
This commit introduces mutation_fragment class which represents the parts
of mutation streamed by streamed_mutation.

mutation_fragment can be:
 - a static row (only one in the mutation)
 - a clustering row
 - start of range tombstone
 - end of range rombstone

There is an ordering (implemented in position_in_partition class) between
mutation_fragment objects. It reflects the order in which content of
partition appears in the sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Duarte Nunes
91aac30f12 mutations: Row tombstones are now a set of ranges
This patch changes the type of the mutation partition's row_tombstones
to be a range_tombstone_list, so that they are now represented as a
set of disjoint ranges. All of its usages are updated accordingly.

Fixes #1155

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
f7809bcaef range_tombstone_list: Add unit test
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
86030885c8 mutations: Introduce range tombstone list
This class is responsible for representing a set of range tombstones
as non-overlapping disjoint sets of range tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
6a111fdd01 mutations: Introduce the range_tombstone class
This patch introduces the range_tombstone class, composed of
a [start, end] pair of clustering_key_prefixes, the type
of inclusiveness of each bound, and a tombstone.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Pekka Enberg
b6b2c84316 Merge "CQL tracing" from Vlad
"This series introduces a tracing infrastructure that may be used
for tracing CQL commands execution and measuring latencies of separate
stages of CQL handling as defined by a CQL binary protocol specification.

To begin tracing one should create a "tracing session", which may then
be used to issuing tracing events.

If execution of a specific CQL command involves other Nodes (not only a Coordinator),
then a "tracing session ID" is passed to that Node (in the context of the
corresponding RPC call). Then this "session ID" may be used to create a
"secondary tracing session" to issue tracing events in the context of the original session.

The series contains an implementation of tracing that uses a keyspace in the current
cluster for storing tracing information.

This series contains a demo per-request tracing instrumentation of a QUERY
CQL command and even this instrumentation is partial: it only fully instruments
a QUERY->SELECT->read_data call chain.

This is by all means a very beginning of the proper instrumentation which is
to come.

Right now the latencies for a single SELECT for a single raw with RF 1 from a 2 Nodes cluster
on my laptop started using ccm (for C* all default parameters, for scylla - memory 256MB, --smp 2)
are as follows (pseudo-graphics warning):
--------------------------------------------------------------------------------------------
                                       | scylla (2 Nodes x 2 shards each)  |     C* 2.1.8
_______________________________________|___________________________________|________________
Coordinator and replica are same Node  |                                   |
(TRACING OFF):                         |                0.3ms              |     0.3ms
c-s with a single thread mean latency  |      (was 0.2ms before the last   |
value                                  |       rebase with a master)       |
--------------------------------------------------------------------------------------------
Coordinator and replica are same Node  |                                   |
(TRACING ON)                           |                ~250us             |     ~1200us
Running a SELECT command from a cqlsh  |                                   |
a few times                            |                                   |
--------------------------------------------------------------------------------------------
Coordinator and replica are not on the |                                   |
same Node                              |                ~700us             |     >2500us
(TRACING ON)                           |                                   |
--------------------------------------------------------------------------------------------

To begin tracing one may use a cqlsh "TRACING ON/OFF" commands:

cqlsh> TRACING ON
Now Tracing is enabled
cqlsh> select "C0", "C1" from keyspace1.standard1  where key=0x12345679;

 C0                 | C1
--------------------+------
 0x000000000001e240 | null

(1 rows)

Tracing session: 146f0180-21e7-11e6-b244-000000000000

 activity                                                          | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------+----------------------------+-----------+----------------
 select "C0", "C1" from keyspace1.standard1  where key=0x12345679; | 2016-05-24 22:38:24.536000 | 127.0.0.1 |              0
                              message received from /127.0.0.1 [0] | 2016-05-24 22:38:24.537000 | 127.0.0.2 |             --
                                          Done reading options [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |              3
                                    read_data handling is done [0] | 2016-05-24 22:38:24.537000 | 127.0.0.2 |             37
                                           Parsing a statement [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |              3
                                        Processing a statement [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |             56
                          Done processing - preparing a result [0] | 2016-05-24 22:38:24.537000 | 127.0.0.1 |            550
                                                  Request complete | 2016-05-24 22:38:24.536560 | 127.0.0.1 |            560

cqlsh>"
2016-06-02 08:35:33 +03:00
Avi Kivity
c7953897d1 build: remove obsolete log.cc dependency 2016-06-01 22:35:07 +03:00
Vlad Zolotarov
a53d329b25 tracing: add a serializable trace_info object
tracing::trace_info is used to pass the tracing information between nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:16:53 +03:00
Vlad Zolotarov
c965528a03 tracing: add a trace_state and tracing classes
trace_state: Is a single tracing session.
tracing:     A sharded service that contains an i_trace_backend_helper instance
             and is a "factory" of trace_state objects.

trace_state main interface functions are:
   - begin(): Start time counting (should be used via tracing::begin() wrapper).
   - trace(): Create a tracing event - it's coupled with a time passed since begin()
              (should be used via tracing::trace() wrapper).
   - ~trace_state(): Destructor will close the tracing session.

"tracing" service main interface function is:
   - start(): Initialize a backend.
   - stop():  Shut down a backend.
   - create_session(): Creates a new tracing session.

(tracing::end_session(): Is called by a trace_state destructor).

When trace_state needs to store a tracing event it uses a backend helper from
a "tracing" service.

A "tracing" service limits a number of opened tracing session by a static number.
If this number is reached - next sessions will be dropped.

trace_state implements a similar strategy in regard to tracing events per singe
session.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:42 +03:00
Vlad Zolotarov
d3988a8113 tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation
Uses a CQL keyspace system_traces to store tracing information.

Uses two tables:

CREATE TABLE system_traces.sessions (
session_id uuid,
command text,
client inet,
coordinator inet,
duration int,
parameters map<text, text>,
request text,
started_at timestamp,
PRIMARY KEY ((session_id)))

and

CREATE TABLE system_traces.events (
session_id uuid,
event_id timeuuid,
activity text,
source inet,
source_elapsed int,
thread text,
PRIMARY KEY ((session_id), event_id))

system_traces.sessions table contains records of tracing sessions.
system_traces.sessions columns description:
   - session_id:  an ID of the session.
   - command:     type of a command this session was created for
                  (currently supported "NONE", "QUERY" and "REPAIR").
   - client:      IP of the client that issued the command.
   - coordinator: IP of a coordinator that received the command.
   - duration:    total duration of the tracing session (in us).
   - parameters:  optional parameters for this session, passed to
                  i_trace_state::begin() call.
   - request:     a CQL command this tracing session is created for.
   - started_at:  the time the session has been started at.

system_traces.events contains records of separate tracing events.
system_traces.events columns description:
   - session_id:     an ID of the session.
   - event_id:       an ID of the event.
   - activity:       the trace point description - a message given to
                     i_trace_state::trace().
   - source:         IP of the Node where trace event was issued.
   - source_elapsed: time passed since creation of a tracing session (in us) on
                     the Node where this trace event was issued.
   - thread:         name of the thread in who's context this trace event was
                     issued in (currently its "core N", where 'N' is an index of
                     a shard the trace event was issued on).

This class will cache lambdas creating the corresponding mutations for each tracing
record requested to be stored till flush() method is called.

flush() will merge all pending mutations to "sessions" and "events" tables and
then apply a mutation to "events" table and when it completes - to "sessions"
table. This way it'll ensure that when some tracing session is visible, all its
events are visible too.

trace_keyspace_helper exposes a few metrics via collectd:
   - tracing_error - a total number of errors (not including OOM)
   - bad_column_family_errors - number of times a tracing record wasn't
                                stored because system_trace tables' schema
                                didn't match the expected value. This may happen if
                                a DB administrator is doing funny things like altering
                                the schemas of the above tables.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:19 +03:00
Avi Kivity
d2e4548b35 Merge seastar upstream
* seastar 0bcdd28...864d6dc (4):
  > Logging framework
  > Add libubsan and libasan to fedora deps docs
  > tests: add rpc cancellable tests
  > rpc: add cancellable interface

Dropped logging implementation in favor of seastar's due to a link
conflict with operator<<.
2016-06-01 18:28:42 +03:00
Avi Kivity
a3b23d75b9 Merge "Fix Prepared message metadata serialization"
"The Prepared message has a metadata section that's similar to result set
metadata but not exactly the same. Fix serialization by introducing a
separate prepared_metadata class like Origin has and implement
serialization as per the CQL protocol specification. This fixes one CQL
binary protocol version 4 issue that we currently have.

The changes have been verified by running the gocql integration tests
using v4. Please note that this series does *not* enable v4 for clients
because Cassandra 2.1.x series only supports CQL binary protocol v3."
2016-05-16 18:59:54 +03:00
Pekka Enberg
a68671e247 cql3: Add column_specification::all_in_same_table() helper
We need it the prepared_metadata class that we're about to introduce.
2016-05-16 14:13:31 +03:00
Pekka Enberg
adfb4d7bbd cql3: Move result_set class implementation to source file 2016-05-16 13:20:45 +03:00
Piotr Jastrzebski
8307681975 Introduce clustering_ranges type.
It will be used to slice data returned by mutation_readers.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:09 +02:00
Pekka Enberg
b5d9aa866d Merge "Fixes for schema synchronization" from Tomek
"Writes may start to be rejected by replicas after issuing alter table
 which doesn't affect columns. This affects all versions with alter table
 support.

 Fixes #1258"
2016-05-12 09:43:25 +03:00
Tomasz Grabiec
90c31701e3 tests: Add unit tests for schema_registry 2016-05-11 17:31:22 +02:00
Calle Wilund
5c36d2e09e alter_keyspace_statement: Implement
Note: Like create keyspace, we don't properly validate 
replication strategy yet.
2016-05-10 14:36:17 +00:00
Takuya ASADA
ec2ef467c8 configure.py: configure.py: add --static-thrift option to link libthrift statically
This is needed for Ubuntu packaging, to drop dependency to libthrift0 on installation time.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461594460-2403-1-git-send-email-syuu@scylladb.com>
2016-04-25 17:44:44 +03:00
Duarte Nunes
fbf70e9bed udt: Add alter type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
809b45e160 udt: Add drop type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:02 +02:00
Pekka Enberg
4ed702f0da Merge "Authorizer support" from Calle
"Conversion/implementation of "authorizer" code from origin, handling
 permissions management for users/resources.

 Default implementation keeps mapping of <user.resource>->{permissions}
 in a table, contents of which is cached for slightly quicker checks.

 Adds access control to all (existing) cql statements.
 Adds access management support to the CQL impl. (GRANT/REVOKE/LIST)

 Verified manually and with dtest auth_test.py. Note that several of these
 still fail due to (unrelated) unimplemented features, like index, types
 etc.

 Fixes #1138"
2016-04-19 15:00:38 +03:00
Calle Wilund
14cc47d8b9 cql3::statements::revoke_statement: Initial conversion 2016-04-19 11:49:06 +00:00
Calle Wilund
4e1ef3c1bc cql3::statements::grant_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
04c37def3a cql3::statements::list_permissions_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
fe23447f6f cql3::statements::permission_altering_statement: Inital conversion
Alter permission base typ
2016-04-19 11:49:05 +00:00
Calle Wilund
add2111c0a cql3::statements::authorizarion_statement: Initial conversion
Auth cql base type
2016-04-19 11:49:05 +00:00
Calle Wilund
1f0bbf2d9a auth::authorizer: Initial conversion
Main authorization endpoint. Default (and only) real authorizer
keeps a mapping resource -> permission sets in system table
2016-04-19 11:49:04 +00:00
Raphael S. Carvalho
29db5f5e1f sstables: move compaction strategy code to a new source file
Moving compaction strategy code from sstables/compaction.cc to
sstables/compaction_strategy.cc
That improves readability. Strategy code should be separated
from the generic compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <5af6fc8f7321351a071fc0ce03c80ffea21f8396.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:45:43 +03:00
Calle Wilund
b8bd77e621 cql3::list_users_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
adaf21403b cql3::drop_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
8732b3eed7 cql3::alter_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
da89189308 cql3::create_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
57f5bb854f cql3::authentication_statement: cql auth base class 2016-04-11 09:10:41 +00:00
Calle Wilund
cef52d1653 cql3::user_options: Add options wrapper type 2016-04-11 09:10:41 +00:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Pekka Enberg
4892a6ded9 build: Invoke Seastar build only once
Make sure we invoke the Seastar ninja build only once from our own build
process so that we don't have multiple ninjas racing with each other.

Refs #1061.

Message-Id: <1458563076-29502-1-git-send-email-penberg@scylladb.com>
2016-03-21 16:22:11 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Asias He
9f64c36a08 storage_service: Fix pending_range_calculator_service
Since calculate_pending_ranges will modify token_metadata, we need to
replicate to other shards. With this patch, when we call
calculate_pending_ranges, token_metadata will be replciated to other
non-zero shards.

In addition, it is not useful as a standalone class. We can merge it
into the storage_service. Kill one singleton class.

Fixes #1033
Refs #962
Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>
2016-03-14 10:14:22 +02:00
Pekka Enberg
97bef4fb7c build: Fix http/http_response_parser.hh dependency
Make sure http_response.hh that is pulled by locator/ec2_snitch.hh is
built. The commit is similar to what commit 6ccf8f8 ("build: make sure
to ask seastar to build http/request_parser.hh, and depend on it") did
for request_parser.hh.

Fixes the following build error on CentOS:

  In file included from ./locator/ec2_multi_region_snitch.hh:41:0,
                   from locator/ec2_multi_region_snitch.cc:39:
  ./locator/ec2_snitch.hh:24:40: fatal error: http/http_response_parser.hh: No such file or directory

Spotted by Shlomi.
Message-Id: <1457612266-315-1-git-send-email-penberg@scylladb.com>
2016-03-10 14:46:41 +01:00
Pekka Enberg
2566f8dc18 configure: Remove 'scylla_libs' variable
It's not actually used by anyone so drop it.
Message-Id: <1457531753-27891-2-git-send-email-penberg@scylladb.com>
2016-03-09 14:56:54 +01:00
Pekka Enberg
9bfb6a0c5b configure: Add boost date_time library as a dependency
It's needed to fix the debug build.
Message-Id: <1457531753-27891-1-git-send-email-penberg@scylladb.com>
2016-03-09 14:56:51 +01:00
Takuya ASADA
da56325f69 configure.py: add support --static-stdc++ for seastar binaries (iotune)
Ubuntu 14.04LTS package is broken now because iotune does not statically linked against libstdc++, so this patch fixed it.
Requires seastar patch to add --static-stdc++ on configure.py.

Fixes #982

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456995050-22007-1-git-send-email-syuu@scylladb.com>
2016-03-03 12:18:47 +02:00
Avi Kivity
bec30ccf25 build: add order-only dependency between building antlr .o and IDL headers
This ensures that if an antlr generated .cpp file depends on an
IDL-generated .hh file, then that .hh is generated before the .o is
built.
2016-03-03 09:52:25 +02:00