"This series modify the stub implementation of unimplemented API method to
return a 500 Http error.
It does so by adding a new API exception unimplemented_exception and a helper
function unimplemented that throw that exception.
A call to unimplemented was added to each of the stub API methods.
After this series a call to an unimplemented to API would return a 500."
Connection drop during read operation is not an error and should not be
reported as such. Furthermore disconnects are already reported by
gossip, so no need to report it for each ongoing read again.
Fixes#320
"It moves the API configuration from the command line argument to the general
config, it also move the api-doc directory to be configurable instead of hard
coded."
"This series enable the nodetool info, by completing the missing APIs.
The main change is returning fixed value for storage_service
is_rpc_server_running, is_native_transport_running and get_exception_count.
After this series it will be possible to run:
nodetool info (while the jmx is runnning) and to get the results without errors
or crashes."
Size-tiered compaction strategy works by creating buckets with sstables
of similar size, but if a bucket's size is greater than max_threshold
(defined in schema), it will not be selected for compaction.
Scenario described by issue 298 is facing that. If compaction takes a
long time to finish, more than max_threshold sstables will be created,
and thus there wouldn't be a 'valid' bucket for compaction.
Solution is to not add a sstable for a bucket that reached its limit,
so that bucket will have a chance to be compacted.
Fixes issue #298.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Fixes#309.
When scanning memtable readers detect is was flushed, which means that
it started to be moved to cache, they fall back to reading from
memtable's sstable.
Eventually what we should do is to combine memtable and cache contents
so that as long as data is not evicted we won't do IO. We do not
support scanning in cache yet though, so there is no point in doing
this now, and it is not trivial.
This fixes a mysterious compilation problem which popped up after
changing header order. It could be that the name "VERSION" picks up
some macro, but I haven't really figured that exactly.
Deleting sstables is tricky, since they can be shared across shards.
This patchset introduces an sstable deletion agreement table, that records
the agreement of shards to delete an sstable. Sstables are only deleted
after all shards have agreed.
With this, we can change core count across boots.
Fixes#53.
All database code was converted to is when storage_proxy was made
distributed, but then new code was written to use storage_proxy& again.
Passing distributed<> object is safer since it can be passed between
shards safely. There was a patch to fix one such case yesterday, I found
one more while converting.
Refs #293
Even more horrible that the shutdown patch. Tests using cql_test_env
are dependant on init.cc functions, but then scylla stopped being shut down
properly, those tests did to -> assert in sharded.hh
Yet another temp patch, simply duplicating the init.cc code for clq_test_env
to ensure we get what we think.
Because of the different implementation the right way of getting the
load (which is the sum of all the live diskspace used) is using the
column_family api and not through the storage_service API.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
rpc_server
This patch changes the behaviour of is_native_transport_running and
is_rpc_server_running to return true and not to fail, we assume that
they are running. It should be changed when an API to start and stop
them will be added.
The get_exception_count will return 0, the definition for it in origin
is exception that were not cought in a thread.
We should re-think about what it means in our implementation, meanwhile
return 0, for no exception, is a reasonable approach.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Some APIs other then the column_family need to use the get_cf_stats,
this adds the helper method decleration to the column_family.hh and
change the implementation decleration to be non-static
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The API contains stub API methods, this adds a call to unimplemented
method in each of the stubed method that is not implemented.
The return remains the same to help the compiler deduce the return type
of the lambda function.
After this patch a call to an unimplemented API function will return
500.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
If an sstable is shared, we can't just go ahead and remove it, because
other shards may still be using it; we need their agreement.
Since each shard will have its own sstable object for the same sstable,
we can't use the sstable class as a synchronization point. Instead,
use a static unordered map indexed by the TOC file name.
"Refs #293
* Disables all normal service shutdowns.
* Calls "stop()" explicitly for database (which in turn will also flush
commitlog etc). Then just does a hard "_exit".
* Add shutdown() + gate to commitlog to prevent data from being added once
system shutdown is initiated. Will and does cause exceptions in write
paths during shutdown.
This is an explicitly asked for workaround series for the interdependency
issues making shutdown as formulated by at_exit + sharded::stop unreliable
right now.
Proper ways of doing this would be to
a.) Make services actually stop service when asked to (stop())
b.) Do shutdown in two steps; stop() and later delete.
Note: I've left the old "at_exit" calls commented in main.cc/init.cc as a
reminder that this is not a final solution."
Bloom filter loading and saving is slow with single-bit access to the bitmap,
causing latency spikes of ~100ms for 20MB sstables. Larger sstables will be
much worse.
Fix by using the newly introduced large_bitmap bulk load/save methods. With
this, the maximum observed task latency was 16ms.
Fixes#299 (partially at least; larger bitmaps may require more work still).
Single-bit accessors are very slow, especially because we don't support
setting a bit to a value (just set to 1 and clear to 0). This causes
loading and retrieving the contents of a bitmap to be painfully slow.
Fix by providing iterator-based load() and save() methods. The methods
support partial load/save so that access to very large bitmaps can be
split over multiple tasks.
* Issue the "stop" method on DB (flushed CL + tables (partially))
* Do hard exit (_exit) to escape destructors and sanity checks.
This patch is horrible but sort of a workaround for various interdepdency
shutdown issues. Until services can actually be turned off, this might be
a viable option.
Refs #293. I will not call it a fix.