From Pekka:
This patch series implements support for CQL DROP TABLE. It uses the newly
added truncate infrastructure under the hood. After this series, the
test_table CQL test in dtest passes:
[penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.table_test
table_test (cql_tests.TestCQL) ... ok
----------------------------------------------------------------------
Ran 1 test in 23.841s
OK
For drop_column_family(), we want to first remove the column_family from
lookup tables and truncate after that to avoid races. Introduce a
truncate() variant that takes keyspace and column_family references.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
Currently, we control incremental backups behavior from the storage service.
This creates some very concrete problems, since the storage service is not
always available and initialized.
The solution is to move it to the column family (and to the keyspace so we can
properly propagate the conf file value). When we change this from the api, we will
have to iterate over all of them, changing the value accordingly.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
We will need to change some properties of the keyspace / cf. We need an acessor
that is not marked as const.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
This patch contains the following changes, in the definition of the read
and write latency histogram it removes the mask value, so the the
default value will be used.
To support the gothering of the read latency histogram the query method
cannot be const as it modifies the histogram statistics.
The read statistic is sample based and it should have no real impact on
performance, if there will be an impact, we can always change it in the
future to a lower sampling rate.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
When we convert exceptions into CQL server errors, type information is
not preserved. Therefore, improve exception error messages to make
debugging dtest failures, for example, slightly easier.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
column_family
This patch adds a getter for the dirty_memory_region_group in the
database object and add an occupency method to column family that
returns the total occupency in all the memtable in the column family.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
All database code was converted to is when storage_proxy was made
distributed, but then new code was written to use storage_proxy& again.
Passing distributed<> object is safer since it can be passed between
shards safely. There was a patch to fix one such case yesterday, I found
one more while converting.
Unlike cache, dirty memory cannot be evicted at will, so we must limit it.
This patch establishes a hard limit of 50% of all memory. Above that,
new requests are not allowed to start. This allows the system some time
to clean up memory.
Note that we will need more fine-grained bandwidth control than this;
the hard limit is the last line of defense against running our of reclaimable
memory.
Tested with a mixed read/write load; after reads start to dominate writes
(due to the proliferation of small sstables, and the inability of compaction
to keep up, dirty memory usage starts to climb until the hard stop prevents
it from climbing further and ooming the server).
"Initial implementation/transposition of commit log replay.
* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
sstables are inspected for high water mark, and then replayed from
those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
per _previous_ runs shards, not current.
Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
like origin. Partly because I am lazy, but also partly because our serial
format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
file, detailing which keyspace/cf:s to replay). Partly because we have no
system properties.
There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
This patch adds the get_non_system_keyspaces that found in origin and
expose the replication strategy. With the get_replication_strategy
method.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Adding to API function to return count of sstables in L0 if leveled
compaction strategy is enabled, 0 otherwise. Currently, we don't
support leveled compaction strategy, so function to return count of
sstables in L0 always return zero.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
It was noticed that the same sstable files could be selected for
compaction if concurrent compaction happens on the same cf.
That's possible because compaction manager uses 2 tasks for
handling compactions.
Solution is to not duplicate cf in the compaction manager queue,
and re-schedule compaction for a cf if needed.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.
So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.
Testcase for compaction manager was also adapted and improved.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
This expose the row_cache in the column family, it will be used by the
API to get the row_cache statistic information.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.
This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.
The first to use the modified histogram are the column family for their read
and write."
Conflicts:
database.hh
We can catch most errors when we try to load an sstable. But if the TOC file is
the one missing, we won't try to load the sstable at all. This case is still an
invalid case, but it is way easier for us to treat it by waiting for all files
to be loaded, and then checking if we saw a file during scan_dir, without its
corresponding TOC.
Fixes#114
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>