Commit Graph

4118 Commits

Author SHA1 Message Date
Paweł Dziepak
c5e617ea78 schema: add NAME_LENGTH constant
It's probably not the best place for this constant, but that's where
it is in origin.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-06-23 16:17:45 +02:00
Paweł Dziepak
7de8f2cda8 cql3: validate properties at keyspace creation
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-06-23 16:17:44 +02:00
Paweł Dziepak
cb8f8f84be cql3: make property_definitions::validate() take const references
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-06-23 16:17:44 +02:00
Tomasz Grabiec
de23b54764 types: Implement to_string()/from_string() for boolean_type 2015-06-23 17:07:37 +03:00
Vlad Zolotarov
efe1696410 cql_test_env: Start the global snitch before storage service
This order is required since 5e1348e741
(storage_service: Use get_local_snitch_ptr in gossip_snitch_info).

This fixes the breakage in the cql_query_test.

Reported-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-23 16:36:27 +03:00
Avi Kivity
20b86ea839 Merge seastar upstream 2015-06-23 15:57:09 +03:00
Gleb Natapov
12f3d53372 storage_proxy: cleanup leftovers from timer consolidation 2015-06-23 15:43:59 +03:00
Gleb Natapov
2be9dfc242 storage_proxy: use fb_utilities::get_broadcast_address()
fixes some fixmes
2015-06-23 15:43:59 +03:00
Gleb Natapov
67ea1b0ec8 Revert "db: hold onto write response handler until timeout handler is executed"
This reverts commit 52aa0a3f91.

After c9909dd183 this is no longer needed since reference to a
handler is not used in abstract_write_response_handler::wait() continuation.

Conflicts:
	service/storage_proxy.cc
2015-06-23 15:43:59 +03:00
Avi Kivity
20e7e3576e Merge "Introduce simple cache" from Tomasz
"This introduces a very simple cache which caches whole partitions.

There is a reclaimer registerred which clears all caches upon memory pressure.
This is a temporary measure until we implement log-structured allocator and
incremental eviction.

I can see that for small data sets this series imporoves cassandra-stress read
throughput from 2k to 50k tps on muninn/huginn."
2015-06-23 15:30:21 +03:00
Tomasz Grabiec
d4e0e5957b db: Integrate cache with the read path 2015-06-23 13:49:25 +02:00
Tomasz Grabiec
d11773bc14 tests: Introduce row cache test 2015-06-23 13:49:24 +02:00
Tomasz Grabiec
2b5d9a917f tests: Convert mutation_reader_test to use seastar threads
Also extract assertions into header file.
2015-06-23 13:49:24 +02:00
Tomasz Grabiec
e40638823e db: Introduce mutation cache
row_cache class is meant to cache data for given table by wrapping
some underlying data source. It gives away a mutation_reader which
uses in-memory data if possible, or delegates to the underlying reader
and populates the cache on-the-fly.

Accesses to data in cache is tracked for eviction purposes by a
separate entity, the cache_tracker. There is one such tracker for the
whole shard.
2015-06-23 13:49:24 +02:00
Tomasz Grabiec
b9288d9fa7 db: Make column_family managed by lw_shared_ptr<>
It will be share-owned by readers.
2015-06-23 13:49:24 +02:00
Tomasz Grabiec
8fd466338d mutation_reader: Introduce helper for consuming all mutations 2015-06-23 13:49:23 +02:00
Tomasz Grabiec
83e7a21dfb mutation: Add apply() helper which works on mutation_opt 2015-06-23 13:49:23 +02:00
Tomasz Grabiec
bdd3fd5019 tests: Add missing blank line 2015-06-23 13:44:37 +02:00
Gleb Natapov
c9909dd183 cluster: consolidate mutation clustering timers
Currently mutation clustering uses two timers, one expires when wait for
cl timeouts and is canceled when cl is achieved, another expires if some
endpoints do not answer for a long time (cl may be already achieved at
this point and first timer will be canceled). This is too complicated
especially since both timers can expire simultaneously. Simplify it by
having only one timer and checking in a callback whether cl was achieved.
2015-06-23 14:42:56 +03:00
Tomasz Grabiec
14a8110d1f future: Avoid copying of the result in get0()
Even though we accept std::tuple<T...>&&, 'x' is an l-value reference
inside get0().
2015-06-23 14:28:01 +03:00
Asias He
5e1348e741 storage_service: Use get_local_snitch_ptr in gossip_snitch_info 2015-06-23 12:12:33 +03:00
Avi Kivity
51eb220c49 Merge "stream reader and writer" from Asias 2015-06-23 11:31:10 +03:00
Asias He
0f8c35a2fe streaming: Convert StreamReader.java to C++ 2015-06-23 16:06:28 +08:00
Asias He
9456fb3935 streaming: Import StreamReader.java 2015-06-23 16:06:28 +08:00
Asias He
4f8a1041ce streaming: Convert StreamWriter.java to C++ 2015-06-23 16:06:28 +08:00
Asias He
5f4293b379 streaming: Import StreamWriter.java 2015-06-23 16:06:28 +08:00
Asias He
fe61c0d8d4 streaming: Convert more of file_message_header.hh 2015-06-23 16:06:28 +08:00
Asias He
90b9d6294f streaming: Convert CompressionInfo.java to C++ 2015-06-23 16:06:28 +08:00
Asias He
b3b2d26305 streaming: Import CompressionInfo.java 2015-06-23 16:06:27 +08:00
Avi Kivity
792a19d40d Merge "global snitch" from Vlad
"
   - Introduce a global distributed snitch object.
   - Add the corresponding methods in i_endpoint_snitch class needed to work with
     this object.
   - Added additional check to gossiping_property_file_snitch_test.
"
2015-06-23 10:49:30 +03:00
Avi Kivity
cbd0be5a68 Merge seastar upstream 2015-06-23 10:46:39 +03:00
Shlomi Livne
f458d6c54a tests: generate seperate boost xml output files
The single generated file is corrupted from time to time. Switch to use
multiple files in the hope that this will resolve the issue.

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-06-23 10:26:24 +03:00
Avi Kivity
c214b3bff9 Merge "initial compaction support" from Nadav 2015-06-23 09:49:08 +03:00
Nadav Har'El
9f7794752f sstables: basic compaction test
This tests the basic compaction functionality: I created three small
tables using Cassandra (see commands below), compact them into one,
load the resulting table and check its content.

This test demonstrates, but is commented out to make the test succeed,
a bug: If a partition had old values and then a newer deletion (tombstone)
in another sstable, both values and tombstones are left behind in the
compacted table.  This will be fixed (and the test uncommented) in a later
patch.

The three sstables were created with:

USE try1;
CREATE TABLE compaction (
	name text,
	age int,
	height int,
	PRIMARY KEY (name)
);
INSERT INTO compaction (name, age) VALUES ('nadav', 40);
INSERT INTO compaction (name, age) VALUES ('john', 30);
<flush>
INSERT INTO compaction (name, height) VALUES ('nadav', 186);
INSERT INTO compaction (name, age, height) VALUES ('jerry', 40, 170);
<flush>
DELETE FROM compaction WHERE name = 'nadav';
INSERT INTO compaction (name, age) VALUES ('john', 20);
INSERT INTO compaction (name, age, height) VALUES ('tom', 20, 180);

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-23 09:48:59 +03:00
Nadav Har'El
f26dae3bf9 sstable: basic compaction function
This patch adds the basic compaction function sstables::compact_sstables,
which takes a list of input sstables, and creates several (currently one)
merged sstable. This implementation is pretty simple once we have all
the infrastructure in place (combining reader, writer, and a pipe between
them to reduce context switches).

This is already working compaction, but not quite complete: We'll need
to add compaction strategies (which sstables to compact, and when),
better cardinality estimator, sstable management and renaming, and a lot
of other details, and we'll probably still need to change the API.
But we can already write a test for compacting existing sstables (see
the next patch), and I wanted to get this patch out of the way, so we can
start working on applying compaction in a real use case.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-23 09:48:58 +03:00
Nadav Har'El
6063d4502f sstable: method for estimating number of partitions in sstable
The sstable has a lot of data, but suprisingly, and accurate count of the
number of partitions isn't available. We can get a good estimate by looking
at the number of summary entries.

Based on Origin's IndexSummary.getEstimatedKeyCount().

We need this estimate for compaction if we can't get (yet) a better
estimate from the cardinality estimator algorithm.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-23 09:48:57 +03:00
Vlad Zolotarov
319491dad7 gossiping_property_file_snitch_test: check that the distribution
Check that the distribution of the new values between shards works.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-22 23:18:40 +03:00
Vlad Zolotarov
67bb1ba132 gossiping_property_file_snitch: Register creators for all parameters set options.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-22 23:18:40 +03:00
Vlad Zolotarov
3520d4de10 locator: introduce a global distributed<snitch_ptr> i_endpoint_snitch::snitch_instance()
Snitch class semantics defined to be per-Node. To make it so we
introduce here a static member in an i_endpoint_snitch class that
has to contain the pointer to the relevant snitch class instance.

Since the snitch contents are not always pure const it has to be per
shard, therefore we'll make it a "distributed". All the I/O is going
to take place on a single shard and if there are changes - they are going
to be propagated to the rest of the shards.

The application is responsible to initialize this distributed<shnitch>
before it's used for the first time.

This patch effectively reverts most of the "locator: futurize
snitch creation" a2594015f9 patch - the part that modifies the
code that was creating the snitch instance. Since snitch is
created explicitly by the application and all the rest of the code
simply assumes that the above global is initialized we won't need
all those changes any more and the code will get back to be nice and simple
as it was before the patch above.

So, to summarize, this patch does the following:
   - Reverts the changes introduced by a2594015f9 related to the fact that
     every time a replication strategy was created there should have been created
     a snitch that would have been stored in this strategy object. More specifically,
     methods like keyspace::create_replication_strategy() do not return a future<>
     any more and this allows to simplify the code that calls it significantly.
   - Introduce the global distributed<snitch_ptr> object:
      - It belongs to the i_endpoint_snitch class.
      - There has been added a corresponding interface to access both global and
        shard-local instances.
      - locator::abstract_replication_strategy::create_replication_strategy() does
        not accept snitch_ptr&& - it'll get and pass the corresponding shard-local
        instance of the snitch to the replication strategy's constructor by itself.
      - Adjusted the existing snitch infrastructure to the new semantics:
         - Modified the create_snitch() to create and start all per-shard snitch
           instances and update the global variable.
         - Introduced a static i_endpoint_snitch::stop_snitch() function that properly
           stops the global distributed snitch.
         - Added the code to the gossiping_property_file_snitch that distributes the
           changed data to all per-shard snitch objects.
         - Made all existing snitches classes properly maintain their state in order
           to be able to shut down cleanly.
         - Patched both urchin and cql_query_test to initialize a snitch instance before
           all other services.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v6:
   - Rebased to the current master.
   - Extended a commit message a little - the summary.

New in v5:
   - database::create_keyspace(): added a missing _keyspaces.emplace()

New in v4:
   - Kept the database::create_keyspace() to return future<> by Glauber's request
     and added a description to this method that needs to be changed when Glauber
     adds his bits that require this interface.
2015-06-22 23:18:31 +03:00
Avi Kivity
6ce39d8399 tests: move a semaphore test from futures_test to semaphore_test 2015-06-22 19:14:05 +03:00
Glauber Costa
a7d612f196 schema tables: add missing columns
We left some columns at a FIXME state, because we didn't have all types
implemented to reflect this. In particular, all collection types were left
behind.

Now that we do, let's refresh the system table's schemas.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-22 19:09:38 +03:00
Avi Kivity
0eb8d7384a Reduce partitioner's dependencies on sstables/*.hh 2015-06-22 19:00:55 +03:00
Avi Kivity
04a25474e4 tombstone: make its print operator nicer 2015-06-22 19:00:55 +03:00
Avi Kivity
78add789c3 Merge "Fix bug with old sstable" from Glauber
"We have found a bug when reading an old sstable. Some versions of Cassandra
will not use start_range as a marker, but rather 0.

We need to account for that possibility."
2015-06-22 18:23:16 +03:00
Glauber Costa
b008f8cf06 sstables: test table with wrong marker
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-22 11:13:42 -04:00
Glauber Costa
6336c02cab sstables: fix bug with old sstable
Some version of Origin will write 0 instead of -1 as the start of range marker
for a range tombstone. I've just came across one of such tables, that ended up
breaking our code. Let's be more flexible in what we accept. We don't really have
a choice.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-22 11:13:42 -04:00
Raphael S. Carvalho
118e4fc8be sstable: make do_write_components more pleasant to read
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-22 16:27:47 +03:00
Tomasz Grabiec
b8db713b81 service: Increase write timeout to 2 seconds
Current timeout is 100ms. cassandra-stress is failing for me often
because of this, with "Mutation write timeout" message.

The comment says that the timeout value is based on
DatabaseDescriptor.getWriteRpcTimeout(), which in Origin is equal to 2
seconds by default, so bump it up.

Code pointers:

DatabaseDescriptor:L844

    public static long getWriteRpcTimeout()
    {
        return conf.write_request_timeout_in_ms;
    }

Config:L74

  public volatile Long write_request_timeout_in_ms = 2000L;
2015-06-22 15:45:51 +03:00
Avi Kivity
acb56b580f abstract_replication_strategy: work around missing _snitch in cql_query_test 2015-06-22 15:28:22 +03:00
Avi Kivity
ddc84f3459 Merge "Adding the stroage_proxy API" from Amnon
"This series adds the storage_proxy API with a stab API implementation.

It covers the API defined in StorageProxyMBean, it does not contain the metrics
associate with the storage proxy that will be added in a different series."
2015-06-22 14:54:17 +03:00