This patch fix an issue with the read latency estimated historam
implementation and add a call to the estimated number of sstable
histogram.
The later is not yet implemented on the datbase side.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch adds the read and write latency estimated histogram support
and add an estimatd histogram to the number of sstable that were used in
a read.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Taking time messurment of an operation can cause peformence degredation,
this patch adds sampling support for the estimated histogram.
It allows to add a sample with a counter that holds what is the actual
total number so far. So the samplling will be counted as multiple
entries in the estimated histogram.
The totatl count of the entries in the histogram will equal the _count
parameter.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
There's no need to pass keyspace_metadata to notify_drop_keyspace()
because all we are interested in is the name. The keyspace has been
dropped so there's not much we could do with its metadata either.
Simplifies the next patch that wires up drop keyspace notification.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
"snapshotting the files themselves is easy: if more than one CF happens to link
an SSTable twice, all but one will fail, and we will end up with one copy.
The problem for us, is that the snapshot procedure is supposed to leave a
manifest file inside its directory. So if we just call snapshot() from
multiple shards, only the last one will succeed, writing its own SSTables to
the manifest leaving all other shards' SSTables unaccounted for.
Moreover, for things like drop table, the operation should only proceed when
the snapshot is complete. That includes the manifest file being correctly
written, and for this reason we need to wait for all shards to finish their
snapshotting before we can move on."
Currently, the snapshot code has all shards writing the manifest file. This is
wrong, because all previous writes to the last will be overwritten. This patch
fixes it, by synchronizing all writes and leaving just one of the shards with the
task of closing the manifest.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
The way manifest creation is currently done is wrong: instead of a final
manifest containing all files from all shards, the current code writes a
manifest containing just the files from the shard that happens to be the
unlucky loser of the writing race.
In preparation to fix that, separate the manifest creation code from the rest.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
We do need to sync jsondir after we write the manifest file (previously done,
but with a question), and before we start it (not previously done) to guarantee
that the manifest file won't reference any file that is not visible yet.
Signed-off-by: Glauber Costa <glommer@scylladb.com>
Currently do_install() does not function correctly when passing glob pattern & package are already installed.
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Since we don't want to let user to upgrade libstdc++, we will link libstdc++ statically, using ./configure.py --static-stdc++
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
"* Runs the batchlog loop on only main cpu, but round-robins the actual work
to each available shard in round-robin fashion.
* Use gate to guard work loop instead of semaphore (better shutdown,
eventually)
* Actually _start_ the batch loop (not done previously)
* Rename logger + add cpu# hint"
Fixes#424
Since replay is a "node global" operation, we should not attempt to
do it in parallel on each shard. It will just overlap/interfere.
Could just run this on cpu 0 or but since this _could_ be a
lengty operation, each timer callback is round-robined shards just in case...
Fixes #423
"Changes the "truncated_at" blob contents of system.local table. It now stores
N replay_positions, where N == # shards.
The system.local table schema remains unchanged, and older truncation data
is accepted, though it will for obvious reasons still be insufficient.
Since the data is opaque to the running instance, blob compatibilty with
origin should be irrelevant (and we're not really that now anyway).
Note that technically, changing shard cound inbetween runs could make us hold
on to RP data "longer than required", but this is
a.) Insignificant data sizes
b.) Data that is valid exactly once: When restarting a failed node and
replaying. The "shards" only refer to "last run", and after that we don't
care. At worst, we can get less than fresh data (not all shards manage
to save truncation records before crash).
It is worth noting (and I've done do in the code) that the system.local table
+ sharding cause some rather silly inefficiencens, since for this (and others)
we store a value for each shard, each save which causes a global flush of the
systable, in turn delegated on all cores. So the op is N^2 in "db complexity".
At some point we should maybe consider if operations like "drop table" and
"truncate" should not be done on shard level, but on machine level, so it can
coordinate itself. But otoh, it is rare and not _very_ expensive either."
We call the conversion function that expectes a NUL terminated string,
but provide a string view, which is not.
Fix by using the begin/end variant, which doesn't require a NUL terminator.
Fixes#437.
Fixes #423
* CF ID now maps to a truncation record comprised of a set of
per-shard RP:s and a high-mark timestamp
* Retrieving RP:s are done in "bulk"
* Truncation time is calculated as max of all shards.
This version of the patch will accept "old" truncation data, though the
result of applying it will most likely not be correct (just one shard)
Record is still kept as a blob, "new" format is indicated by
record size.
Must ensure we find a chunk/entry boundary still even when run
with a start offset, since file navigation in chunk based.
Was not observed as broken previously because
1.) We did not run with offsets
2.) The exception never reached caller.
Also make the reader silently ignore empty files.
type
Allow providing both hash/equal etc for resulting map, as well
as explicit data_types for the deserialization.
Also allow direct extraction of kv-pairs to iterator, for more advanced
unpacking.
Remove the about to be dropped CF from the UUID lookup table before
truncating and stopping it. This closes a race window where new
operations based on the UUID might be initiated after truncate
completes.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
"This patch series implements support for CQL DROP KEYSPACE and makes the
test_keyspace CQL test in dtest pass:
[penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.keyspace_test
keyspace_test (cql_tests.TestCQL) ... ok
----------------------------------------------------------------------
Ran 1 test in 12.166s
OK
[penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.table_test
table_test (cql_tests.TestCQL) ... ok
----------------------------------------------------------------------
Ran 1 test in 23.841s
OK"
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.
Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
We need to capture the "is_local_only" boolean by value because it's an
argument to the function. Fixes an annoying bug where we failed to update
schema version because we pass "true" accidentally.
Signed-off-by: Pekka Enberg <penberg@scylladb.com>