Paweł Dziepak
27014a23d7
treewide: require type info for copying atomic_cell_or_collection
2018-05-31 15:51:11 +01:00
Avi Kivity
404172652e
Merge "Use xxHash for digest instead of MD5" from Duarte
...
"This series changes digest calculation to use a faster algorithm
(xxHash) and to also cache calculated cell hashes that can be kept in
memory to speed up subsequent digest requests.
The MD5 hash function has proved to be slow for large cell values:
size = 256; elapsed = 4us
size = 512; elapsed = 8us
size = 1024; elapsed = 14us
size = 2048; elapsed = 21us
size = 4096; elapsed = 33us
size = 8192; elapsed = 51us
size = 16384; elapsed = 86us
size = 32768; elapsed = 150us
size = 65536; elapsed = 278us
size = 131072; elapsed = 531us
size = 262144; elapsed = 1032us
size = 524288; elapsed = 2026us
size = 1048576; elapsed = 4004us
size = 2097152; elapsed = 7943us
size = 4194304; elapsed = 15800us
size = 8388608; elapsed = 31731us
size = 16777216; elapsed = 64681us
size = 33554432; elapsed = 130752us
size = 67108864; elapsed = 263154us
The xxHash is a non-cryptographic, 64bit (there's work in progress on
the 128 version) hash that can be used to replace MD5. It performs much
better:
size = 256; elapsed = 2us
size = 512; elapsed = 1us
size = 1024; elapsed = 1us
size = 2048; elapsed = 2us
size = 4096; elapsed = 2us
size = 8192; elapsed = 3us
size = 16384; elapsed = 5us
size = 32768; elapsed = 8us
size = 65536; elapsed = 14us
size = 131072; elapsed = 28us
size = 262144; elapsed = 59us
size = 524288; elapsed = 116us
size = 1048576; elapsed = 226us
size = 2097152; elapsed = 456us
size = 4194304; elapsed = 935us
size = 8388608; elapsed = 1848us
size = 16777216; elapsed = 4723us
size = 33554432; elapsed = 10507us
size = 67108864; elapsed = 21622us
Performance was tested using a 3 node cluster with 1 cpu and 8GB,
and with the following cassandra-stress loaders. Measurements are for
the read workload.
sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100
sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100
xxhash + caching:
Results:
op rate : 32699 [READ:32699]
partition rate : 32699 [READ:32699]
row rate : 32699 [READ:32699]
latency mean : 3.0 [READ:3.0]
latency median : 3.0 [READ:3.0]
latency 95th percentile : 3.9 [READ:3.9]
latency 99th percentile : 4.5 [READ:4.5]
latency 99.9th percentile : 6.6 [READ:6.6]
latency max : 24.0 [READ:24.0]
Total partitions : 10000000 [READ:10000000]
Total errors : 0 [READ:0]
total gc count : 0
total gc mb : 0
total gc time (s) : 0
avg gc time(ms) : NaN
stdev gc time(ms) : 0
Total operation time : 00:05:05
END
md5:
Results:
op rate : 25241 [READ:25241]
partition rate : 25241 [READ:25241]
row rate : 25241 [READ:25241]
latency mean : 3.9 [READ:3.9]
latency median : 3.9 [READ:3.9]
latency 95th percentile : 5.1 [READ:5.1]
latency 99th percentile : 5.8 [READ:5.8]
latency 99.9th percentile : 8.0 [READ:8.0]
latency max : 24.8 [READ:24.8]
Total partitions : 10000000 [READ:10000000]
Total errors : 0 [READ:0]
total gc count : 0
total gc mb : 0
total gc time (s) : 0
avg gc time(ms) : NaN
stdev gc time(ms) : 0
Total operation time : 00:06:36
END
This translates into a 21% improvoment for this workload.
Bigger cell values were also tested:
sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100
sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100
xxhash + caching:
Results:
op rate : 19964 [READ:19964]
partition rate : 19964 [READ:19964]
row rate : 19964 [READ:19964]
latency mean : 4.9 [READ:4.9]
latency median : 4.6 [READ:4.6]
latency 95th percentile : 7.2 [READ:7.2]
latency 99th percentile : 11.5 [READ:11.5]
latency 99.9th percentile : 13.6 [READ:13.6]
latency max : 29.2 [READ:29.2]
Total partitions : 10000000 [READ:10000000]
Total errors : 0 [READ:0]
total gc count : 0
total gc mb : 0
total gc time (s) : 0
avg gc time(ms) : NaN
stdev gc time(ms) : 0
Total operation time : 00:08:20
END
md5:
Results:
op rate : 12773 [READ:12773]
partition rate : 12773 [READ:12773]
row rate : 12773 [READ:12773]
latency mean : 7.7 [READ:7.7]
latency median : 7.3 [READ:7.3]
latency 95th percentile : 10.2 [READ:10.2]
latency 99th percentile : 16.8 [READ:16.8]
latency 99.9th percentile : 19.2 [READ:19.2]
latency max : 71.5 [READ:71.5]
Total partitions : 10000000 [READ:10000000]
Total errors : 0 [READ:0]
total gc count : 0
total gc mb : 0
total gc time (s) : 0
avg gc time(ms) : NaN
stdev gc time(ms) : 0
Total operation time : 00:13:02
END
This translates into a 37% improvoment for this workload.
Fixes #2884
Tests: unit-tests (release), dtests (smp=2)
Note: dtests are kinda broken in master (> 30 failures), so take the
tests tag with a grain of himalayan salt."
* 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits)
tests/row_cache_test: Test hash caching
tests/memtable_test: Test hash caching
tests/mutation_test: Use xxHash instead of MD5 for some tests
tests/mutation_test: Test xx_hasher alongside md5_hasher
schema: Remove unneeded include
service/storage_proxy: Enable hash caching
service/storage_service: Add and use xxhash feature
message/messaging_service: Specify algorithm when requesting digest
storage_proxy: Extract decision about digest algorithm to use
cache_flat_mutation_reader: Pre-calculate cell hash
partition_snapshot_reader: Pre-calculate cell hash
query::partition_slice: Add option to specify when digest is requested
row: Use cached hash for hash calculation
mutation_partition: Replace hash_row_slice with appending_hash
mutation_partition: Allow caching cell hashes
mutation_partition: Force vector_storage internal storage size
test.py: Increase memory for row_cache_stress_test
atomic_cell_hash: Add specialization for atomic_cell_or_collection
query-result: Use digester instead of md5_hasher
range_tombstone: Replace feed_hash() member function with appending_hash
...
2018-02-08 18:24:58 +02:00
Duarte Nunes
d28bdb25c5
tests/memtable_test: Test hash caching
...
Signed-off-by: Duarte Nunes <duarte@scylladb.com >
2018-02-01 01:02:50 +00:00
Paweł Dziepak
20c460d8f0
tests/memtable: add more reader exception safety tests
2018-01-31 16:05:35 +00:00
Paweł Dziepak
1406ac5088
tests/memtable: add test for reader exception safety
2018-01-30 18:33:26 +01:00
Piotr Jastrzebski
7729bc5e7b
Remove unused mutation_reader_assertions
...
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
2018-01-24 20:56:48 +01:00
José Guilherme Vanz
380bc0aa0d
Swap arguments order of mutation constructor
...
Swap arguments in the mutation constructor keeping the same standard
from the constructor variants. Refs #3084
Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com >
Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com >
2018-01-21 12:58:42 +02:00
Tomasz Grabiec
7ce02bc22e
tests: memtable: Test that memtable with many versions is a mutation source
2017-12-22 11:06:34 +01:00
Piotr Jastrzebski
b1676db658
Migrate test_virtual_dirty_accounting_on_flush to flat reader
...
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
2017-12-21 11:47:07 +01:00
Piotr Jastrzebski
b90677272f
Migrate test_adding_a_column_during_reading_doesnt_affect_read_result
...
to flat reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
2017-12-21 11:47:07 +01:00
Piotr Jastrzebski
ddecd385c1
Migrate test_partition_version_consistency_after_lsa_compaction_happens
...
to flat reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
2017-12-21 11:47:07 +01:00
Piotr Jastrzebski
681dc26dd1
Migrate test_fast_forward_to_after_memtable_is_flushed to flat reader
...
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
2017-12-21 11:47:07 +01:00
Jesse Haber-Kucharsky
fb0866ca20
Move thread_local declarations out of main.cc
...
Since `disk-error-handler.hh` defines these global variables `extern`,
it makes sense to declare them in the `disk-error-handler.cc` instead of
`main.cc`.
This means that test files don't have to declare them.
Fixes #2735 .
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com >
Message-Id: <1eed120bfd9bb3647e03fe05b60c871de2df2a86.1511810004.git.jhaberku@scylladb.com >
2017-11-27 20:27:42 +01:00
Paweł Dziepak
87b600cad8
tests/memtable: add test for flush reader
2017-11-27 20:07:23 +01:00
Paweł Dziepak
32eb6437fd
memtable: make make_flush_reader() return flat_mutation_reader
2017-11-27 20:07:22 +01:00
Piotr Jastrzebski
83fd22face
Add test to reproduce #2854
...
When memtable gets flushed, existing mutation_readers created
for it stop handling fast_forward_to correctly.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com >
Message-Id: <f580ac59f3fcec53e7c78ad7a8b6374eb36958c6.1506690042.git.piotr@scylladb.com >
2017-09-29 15:17:53 +02:00
Glauber Costa
80440c0d79
database: rework dirty memory hierarchy
...
Issue #1918 describes a problem, in which we are generating smaller
memtables than we could, and therefore not respecting the flush
criteria.
That happens because group sizes (and limits) for pressure purposes, and
the the soft threshold is currently at 40 %. This causes system group's
soft threshold to be way below regular's virtual dirty limit and close
to regular group's soft threshold. The system group was very likely to
become under soft pressure when regular was because writes to regular
group are not yet throttled when they cross both soft thresholds.
This is a direct consequence of the linear hierarchy between the regions
and to guarantee that it won't happen we would have acqire the semaphore
of all ancestor regions when flushing from a child region. While that
works, it can lead to problems on its own, like priority inversion if
the regions have different priorities - like streaming and regular, and
groups lower in the hierarchy, like user, blocking explicit flushes
from their ancestors
To fix that, this patch reorganizes the dirty memory region groups so
that groups are now completely independent. As a disadvantage, when
streaming happen we will draw some memory from the cache, but we will
live with it for the time being.
Fixes #1918
Signed-off-by: Glauber Costa <glauber@scylladb.com >
2016-12-13 14:07:53 -05:00
Tomasz Grabiec
c3768fe4de
memtable: Pass dirty_memory_manager& to memtable constructor
...
The implementation assumes that memtable's region group is owned by
dirty_memory_manager, and tries to obtain a reference to it like this:
boost::intrusive::get_parent_from_member(_region.group(), &dirty_memory_manager::_region_group));
This is undefined behavior when the region's group does not come from
dirty manager. It's safer to be explicit about this dependency by
taking a reference to dirty_memory_manager in the constructor.
2016-12-05 12:59:09 +01:00
Avi Kivity
7faf2eed2f
build: support for linking statically with boost
...
Remove assumptions in the build system about dynamically linked boost unit
tests. Includes seastar update which would have otherwise broken the
build.
2016-10-26 08:51:21 +03:00
Tomasz Grabiec
308434f891
tests: memtable: Add test for partition version list consistency after compaction
2016-10-18 11:57:14 +02:00
Tomasz Grabiec
d836e8f64b
tests: memtable: Add tests for flushing reader
...
Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com >
2016-10-14 15:11:06 +01:00
Duarte Nunes
dc8319ed91
keys: Remove schema argument from make_empty
...
An empty key is independent of the schema.
Signed-off-by: Duarte Nunes <duarte@scylladb.com >
2016-06-02 16:21:36 +02:00
Pekka Enberg
38a54df863
Fix pre-ScyllaDB copyright statements
...
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".
Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com >
2016-04-08 08:12:47 +03:00
Benoît Canet
1fb9a48ac5
exception: Optionally shutdown communication on I/O errors.
...
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.
Signed-off-by: Benoît Canet <benoit@scylladb.com >
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com >
2016-03-17 15:02:52 +02:00
Tomasz Grabiec
a63971ee4c
tests: memtable_test: Add test for concurrent reading and schema changes
2016-01-11 10:34:52 +01:00
Avi Kivity
d5cf0fb2b1
Add license notices
2015-09-20 10:43:39 +03:00
Tomasz Grabiec
8978d0ba1a
tests: Test memtable data survives full compaction
2015-08-06 14:05:16 +02:00
Avi Kivity
c720cddc5c
tests: mv tests/urchin/* -> tests/
...
Now that seastar is in a separate repository, we can use the tests/
directory.
2015-08-05 14:16:52 +03:00