Commit Graph

8785 Commits

Author SHA1 Message Date
Glauber Costa
2cd756ae5e repair: replace a magic number with another magic number
In due time we will have to fix this, but as an interim step, let's use
a "better" magic number.

The problem with 100, is that as soon as the partitions start to go bigger,
we're using too much memory. Since this is multiplied by the number of token
ranges, and happens in every shard, the final number can become really big,
and the amount of resources we use go up proportionally.

This means that even we are mistaken about the new number (we probably are),
in this case it is better to err on the side of a more conservative resource
usage.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <97158f3db5734916cee4ccf12eaa66e7402570bb.1457448855.git.glauber@scylladb.com>
2016-03-08 17:29:00 +02:00
Nadav Har'El
b7e29691c2 sstables: avoid index and data file over-reads
When we do a streaming read that knows the expected *end* position of the
read, we can use a large read-ahead buffer, and at the same time, stop
reading at exactly the intended end (or small rounding of it to the DMA
block size) and not waste resources blindly reading a large amount of data
after the end just to fill the read-ahead buffer.

The sstable reading code, both for reading the data file and the index file,
created a file input stream without specifiying its end, thereby losing
this optimization - so when a large buffer was used, we would get a large
over-read. This patch fixes this, so sstable data file and index file are
read using a file input stream which is a ware of its end.

Fixes #964.

Note that this patch does not change the behavior when reading a
*compressed* data file. For compressed read, we did not have the problem
of over-read in the first place, because chunks are read one by one.
But we do have other sources of inefficiencies there (stemming, again,
from the fact that the compressed chunks are read one by one), and I
opened a separate issue #992 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457219304-12680-1-git-send-email-nyh@scylladb.com>
2016-03-08 17:26:10 +02:00
Calle Wilund
8575f1391f lists.cc: fix update insert of frozen list
Fixes #967

Frozen lists are just atomic cells. However, old code inserted the
frozen data directly as an atomic_cell_or_collection, which in turn
meant it lacked the header data of a cell. When in turn it was
handled by internal serialization (freeze), since the schema said
is was not a (non-frozen) collection, we tried to look at frozen
list data as cell header -> most likely considered dead.
Message-Id: <1457432538-28836-1-git-send-email-calle@scylladb.com>
2016-03-08 13:48:45 +01:00
Pekka Enberg
81af486b69 Update scylla-ami submodule
* dist/ami/files/scylla-ami d4a0e18...84bcd0d (1):
  > Add --ami parameter
2016-03-08 13:49:31 +02:00
Takuya ASADA
254b0fa676 dist: show message to use XFS for scylla data directory and also notify about developer mode, when iotune fails
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457426286-15925-1-git-send-email-syuu@scylladb.com>
2016-03-08 12:20:33 +02:00
Pekka Enberg
83d82ea901 Merge "Fix Ubuntu package issues on AMI" from Takuya
"This fixes bugs on Ubuntu package and AMI scripts, closes #991."
2016-03-08 11:51:30 +02:00
Takuya ASADA
18a27de3c8 dist: export all entries on /etc/default/scylla-server on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-08 18:18:30 +09:00
Gleb Natapov
ce6d1a242a storage_proxy: fix background_reads counter
background_reads collectd counter was not always properly decremented.
Fix it and streamline background read repair error handling.

Message-Id: <20160307182255.GI4849@scylladb.com>
2016-03-07 19:41:09 +01:00
Yoav Kleinberger
1cd01cd2ab tools/scyllatop: defend against curses "out of screen bounds" error
Fixes issue #945 (hopefully)
This issue was probably the result of trying to write outside the
confines of the window. The views.Base class now defends against this.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <9735806b211567f3239e187d87437c484f532291.1457265435.git.yoav@scylladb.com>
2016-03-07 18:02:26 +01:00
Raphael S. Carvalho
0f4239d63a service: improve logging of storage_service::load_new_sstables
Closes #952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2402f387c32d2d1221e740edb67e56c1593c1936.1457366098.git.raphaelsc@scylladb.com>
2016-03-07 18:01:52 +01:00
Raphael S. Carvalho
e850c1406e sstables: update comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8abc1c6c66ed8d3bb35ecfb6d8251de3f61a97ae.1457093016.git.raphaelsc@scylladb.com>
2016-03-07 17:36:34 +01:00
Raphael S. Carvalho
822759eee0 compaction_manager: update stat pending_tasks properly
Size of both _cfs_to_cleanup and _cfs_to_compact must be added when
calculating a new value to _stats.pending_tasks.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b601e24d0631922798575f39d00fb54fe00d4971.1457093016.git.raphaelsc@scylladb.com>
2016-03-07 17:36:03 +01:00
Gleb Natapov
2d092bbd32 storage_proxy: send read requests with timeout
No need to wait for replies long after request is timed out.
Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>
2016-03-07 14:00:11 +01:00
Gleb Natapov
4122422d19 storage_proxy: always wait for digest read resolver done future
Currently it is waited upon only if background read repair check is
needed and this cause unhandled exception warning to be printed if
it enters failed state. Fix this by always waiting on it, but doing
anything beyond ignoring an exception only if check is needed.
Message-Id: <1457351304-28721-1-git-send-email-gleb@scylladb.com>
2016-03-07 14:00:09 +01:00
Gleb Natapov
626c9d046b fix EACH_QUORUM handling during bootstrapping
Currently write acknowledgements handling does not take bootstrapping
node into account for CL=EACH_QUORUM. The patch fixes it.

Fixes #994

Message-Id: <20160307121620.GR2253@scylladb.com>
2016-03-07 13:56:34 +01:00
Raphael S. Carvalho
d65642cee8 fix storage_service::load_new_sstables() to not disable write permanently
Avi says:
"If an exception happens, then enable_sstable_writes won't be called."

The problem is fixed by catching a possible exception and enabling sstable
write for the relevant column family if it wasn't enabled already.

Closes #953.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <32c1bcb2c60c7b9e5514eb0a95062f40ca92093a.1457119308.git.raphaelsc@scylladb.com>
2016-03-07 13:56:02 +01:00
Gleb Natapov
f59415b3c6 Take pending endpoints into account while checking for sufficient live nodes
During bootstrapping additional copies of data has to be made to ensure
that CL level is met (see CASSANDRA-833 for details). Our code does
that, but it does not take into account that bootstraping node can be
dead which may cause request to proceed even though there is no
enough live nodes for it to be completed. In such a case request neither
completes nor timeouts, so it appear to be stuck from CQL layer POV. The
patch fixes this by taking into account pending nodes while checking
that there are enough sufficient live nodes for operation to proceed.

Fixes #965

Message-Id: <20160303165250.GG2253@scylladb.com>
2016-03-07 13:30:13 +01:00
Gleb Natapov
8dad399256 log: add space between log level and date in the outpu
It was dropped by 6dc51027a3

Message-Id: <20160306125313.GI2253@scylladb.com>
2016-03-07 13:06:06 +01:00
Tomasz Grabiec
9deb036e4e Merge branch 'dev/issue-845-set-incremental-backup-config-v1' from seastar-dev.git
From Vlad:

This series modifies the 'database' class to use the internal
_enable_incremental_backups value (initialized with
'incremental_backups' configuration value) instead of using the
'incremental_backups' configuration value directly.

Then we update this internal value in runtime from 'nodetool
enable/disablebackup' API callback so that newly created keyspaces and
column families use the newly configured incremental backup
configuration.
2016-03-07 10:47:20 +01:00
Tomasz Grabiec
b3e56549ca Merge branch 'dev/issue-909-synchronization-part-v2' from seatar-dev.git
From Vlad:

This series fixes the first part of issue #909 (the second part has a
separate github issue #965) which is a discrepancy between a
storage_service::token_metadata and a gossiper::endpoint_state_map
contents on non-zero shards.
2016-03-07 10:20:15 +01:00
Paweł Dziepak
99b61d3944 lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
2016-03-07 10:15:28 +01:00
Takuya ASADA
9ee14abf24 dist: export sysconfig for scylla-io-setup.service
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 18:13:30 +09:00
Takuya ASADA
3d9dc52f5f Revert "Revert "dist: align ami option with others (-a --> --ami)""
This reverts commit 66c5feb9e9.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 18:13:30 +09:00
Takuya ASADA
c9882bc2c4 Revert "Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"""
This reverts commit 643beefc8c.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
	dist/common/sysconfig/scylla-server

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 17:15:42 +09:00
Takuya ASADA
c888eaac74 dist: add /etc/scylla.d/io.conf on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 17:15:42 +09:00
Vlad Zolotarov
2cd836a02e api::set_storage_service(): fix the 'nodetool enablebackup' API
'nodetool enable/disablebackup' callback was modifying only the
existing keyspaces and column families configurations.
However new keyspaces/column families were using
the original 'incremental_backups' configuration value which could
be different from the value configured by 'nodetool enable/disablebackup'
user command.

This patch updates the database::_enable_incremental_backups per-shard
value in addition to updating the existing keyspaces and column families
configurations.

Fixes #845

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 17:26:31 +02:00
Vlad Zolotarov
a45ecaf336 database: store "incremental backup" configuration value in per-shard instance
Store the "incremental_backups" configuration value in the database
class (and use it when creating a keyspace::config) in order to be
able to modify it in runtime.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 17:22:48 +02:00
Vlad Zolotarov
87e6efcdab storage_service: distribute gossiper::endpoint_state_map together with token_metadata
If storage_service::token_metadata is not distributed together with
gossiper::endpoint_state_map there may be a situation when a non-zero
shard sees a new value in token_metadata (e.g. newly added node's
token ranges) while still seeing an old gossiper::endpoint_state_map
contents (e.g. a mentioned above newly added node may not be present,
thus causing gossiper::is_alive() to return FALSE for that node, while
the node is actually alive and kicking).

To avoid this discrepancy we will always update a token_metadata together
with an endpoint_state_map when we distribute new token_metadata data
among shards.

Fixes #909

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 13:15:19 +02:00
Vlad Zolotarov
3a72ef87f2 gossiper: make _shadow_endpoint_state_map public and rename
We will need to access it from a storage_service class when replicate
token_metadata.

Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map
according to our coding convention.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Vlad Zolotarov
4a21d48cc5 gossiper: use a semaphore instead of a future<> for serializing a timer callback
Use a semaphore to allow serializing with a gossiper's timer callback.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Takuya ASADA
6dc51027a3 log: make log.cc able to compile with g++-4.9
std::put_time() is not implemented on g++-4.9, so replace it with strftime().
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457024183-893-1-git-send-email-syuu@scylladb.com>
2016-03-04 12:48:43 +01:00
Avi Kivity
6c2e57b003 Merge seastar upstream
* seastar ba615c7...906b562 (1):
  > rpc: prepare some more for feature negotiation
2016-03-03 18:22:57 +02:00
Gleb Natapov
b89b6f442b storage_proxy: fix race between read cl completion and timeout in digest resolver
If timeout happens after cl promise is fulfilled, but before
continuation runs it removes all the data that cl continuation needs
to calculate result. Fix this by calculating result immediately and
returning it in cl promise instead of delaying this work until
continuation runs. This has a nice side effect of simplifying digest
mismatch handling and making it exception free.

Fixes #977.

Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>
2016-03-03 16:48:28 +02:00
Gleb Natapov
e4ac5157bc storage_proxy: store only one data reply in digest resolver.
Read executor may ask for more than one data reply during digest
resolving stage, but only one result is actually needed to satisfy
a query, so no need to store all of them.

Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:53 +02:00
Gleb Natapov
69b61b81ce storage_proxy: fix cl achieved condition in digest resolver timeout handler
In digest resolver for cl to be achieved it is not enough to get correct
number of replies, but also to have data reply among them. The condition
in digest timeout does not check that, fortunately we have a variable
that we set to true when cl is achieved, so use it instead.

Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:11 +02:00
Tomasz Grabiec
2abd62b5cb bytes_ostream: Drop methods which serialize integers
This will make bytes_ostream completely agnostic to serialization
format, which should be determined by layer above it.

Message-Id: <1457004221-8345-2-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:27 +02:00
Tomasz Grabiec
aaac2a3cec serializer: Add missing include
Message-Id: <1457004221-8345-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:22 +02:00
Pekka Enberg
9c930d88a0 db/system_keyspace: Remove ifdef'd code
We have our implementations of all the three ifdef'd functions.

Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>
2016-03-03 12:26:50 +02:00
Takuya ASADA
da56325f69 configure.py: add support --static-stdc++ for seastar binaries (iotune)
Ubuntu 14.04LTS package is broken now because iotune does not statically linked against libstdc++, so this patch fixed it.
Requires seastar patch to add --static-stdc++ on configure.py.

Fixes #982

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456995050-22007-1-git-send-email-syuu@scylladb.com>
2016-03-03 12:18:47 +02:00
Avi Kivity
d4c92c7e27 Merge seastar upstream
* seastar b3fc7c5...ba615c7 (1):
  > configure.py: add --static-stdc++ to link libstdc++ statically
2016-03-03 12:18:23 +02:00
Asias He
01cb6b0d42 gossip: Send syn message in parallel and do not wait for it
1) As explained in commit 697b16414a (gossip: Make gossip message
handling async), in each gossip round we can make talking to the 1-3
peer nodes in parallel to reduce latency of gossip round.

2) Gossip syn message uses one way rpc message, but now the returned
future of the one way message is ready only when message is dequeued for
some reason (sent or dropped). If we wait for the one way syn messge to
return it might block the gossip round for a unbounded time. To fix, do
not wait for it in the gossip round. The downside is there will be no
back pressure to bound the syn messages, however since the messages are
once per second, I think it is fine.
Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>
2016-03-03 11:17:50 +02:00
Takuya ASADA
e545013e47 Revert "dist: downgrade g++ to 4.9 on Ubuntu"
This reverts commit 01bd4959ac.

Fixes #983

Conflicts:
	dist/ubuntu/build_deb.sh
	dist/ubuntu/control.in
	dist/ubuntu/rules.in

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456996244-19889-1-git-send-email-syuu@scylladb.com>
2016-03-03 11:12:18 +02:00
Tomasz Grabiec
04f2482d74 schema_tables: Log results of schema merge
Currently schema changes are only logged at coordinator node which
initiates the change. It would be helpful in post morten analysis to
also see when and how schema changes are resolved when applied on
other nodes.
Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:12:15 +02:00
Nadav Har'El
2cf09147b5 Repair: don't use freeze() to calculate mutation checksums
Use the existing "feed_hash" mechanism to find a checksum of the
content of a mutation, instead of serializing the mutation (with freeze())
and then finding the checksum of that string.

The serialized form is more prone to future changes, and not really
guaranteed to provide equal hashes for mutations which are considered
"equal".

Fixes #971

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1456958676-27121-1-git-send-email-nyh@scylladb.com>
2016-03-03 09:58:24 +01:00
Avi Kivity
bec30ccf25 build: add order-only dependency between building antlr .o and IDL headers
This ensures that if an antlr generated .cpp file depends on an
IDL-generated .hh file, then that .hh is generated before the .o is
built.
2016-03-03 09:52:25 +02:00
Tomasz Grabiec
b42d3a90b3 cql3: create_table_statement: Sort _defined_names by text
Currently they are sorted by address in memory, which breaks the
check for column name duplicates, which assumes sorting by text.

Fixes #975.

Message-Id: <1456937400-20475-1-git-send-email-tgrabiec@scylladb.com>
2016-03-02 18:53:43 +02:00
Avi Kivity
dda77d14b9 Merge seastar upstream
* seastar 9964cbf...b3fc7c5 (2):
  > Introduce util/indirect.hh
  > reactor: new counters for the io queue
2016-03-02 18:52:36 +02:00
Calle Wilund
0c3322befd commitlog: Ensure segment survives whole flush call
Must keep shared pointer alíve.
Likewise though, the shared pointer copy in cycle main continuation
is not needed.

Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>
2016-03-02 18:22:13 +02:00
Calle Wilund
f1c4e3eb3d commitlog: Clear reserve segments in orphan_all
Otherwise they will keep the segment_manager alive (leak).
Fixes jenkins ASan errors.

Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>
2016-03-02 18:22:09 +02:00
Calle Wilund
a556f665c0 commitlog: Take segment_manager locks first in write/flush
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.

And since we should really only have one active (contending)
segment per shard anyway, it should not matter.

Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
2016-03-02 18:22:05 +02:00