Commit Graph

442 Commits

Author SHA1 Message Date
Vlad Zolotarov
de6cf8db51 db::config: add get_conf_dir()
This function returns the directory containing the configuration
files. It takes into an account the evironment variables as follows:
   - If SCYLLA_CONF is defines - this is the directory
   - else if SCYLLA_HOME is defines, then $SCYLLA_HOME/conf is the directory
   - else "conf" is a directory, namely the configuration files should be
     looked at ./conf

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Updated get_conf_dir() description.
2015-10-08 20:57:11 +03:00
Pekka Enberg
5878f62b18 db/schema_tables: Clean up indentation
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
1f9e769dd3 db/schema_tables: Remove obsolete ifdef'd code
Remove ifdef'd code that we won't be converting to C++ because of design
differences.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
6e304cd58c db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.

Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
5d9d1e28cb db/schema_tables: Implement make_drop_keyspace_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
633279415d db/schema_tables: Fix merge_tables() to actually drop tables
When we query schema tables after we have applied a delete mutation, the
dropped table does not exist in the "after" result set. Fix the
merge_tables() algorithm to take that into account.

Makes merge_tables() really call to database::drop_column_family() when
a table is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
82d20dba65 db/schema_tables: Implement make_drop_table_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b89b70daa8 db/schema_tables: Wire up drop column notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
0651ab6901 database: Futurize drop_column_family() function
Futurize drop_column_family() so that we can call truncate() from it.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b74a9d99d5 db/schema_tables: Fix UTF-8 serialization
Use the utf8_type to serialize strings instead of using to_bytes().

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-05 09:26:15 +02:00
Calle Wilund
7856d7fe02 config: Change "auto_snapshot" to "used" 2015-09-30 09:09:42 +02:00
Calle Wilund
b3c95ce42d system_keyspace: Change truncation record method to use context qp
Align with rest of file (for better or worse). This allows calls from
entity without query_processor handy (i.e. storage_proxy).

Added "minimal" setup method for the "global" state, to facilitate
tests. Doing a full setup either in cql_test_env or after it is created
breaks badly. (Not sure why). So quick workaround.

Updated the current two users (batchlog_manager and commitlog_replayer)
callsites to conform.
2015-09-30 09:09:41 +02:00
Calle Wilund
3abd8b38b6 query_context: Expose query_processor (local) 2015-09-30 09:09:41 +02:00
Avi Kivity
0ec0e32014 Merge "ommitlog: preallocate segments" from Calle
"Modified version of the initial patch (which was reverted), further
reducing the possible delay states in CL allocation and segment management."
2015-09-29 17:02:54 +03:00
Pekka Enberg
f43f0d6f04 keys: Add compound_wrapper::from_singular()
Clean up code by adding a from_singular() helper function to compound
wrapper and use it in.
2015-09-28 16:29:44 +02:00
Calle Wilund
4941d91063 Commitlog: add some more verbosity 2015-09-22 12:57:33 +02:00
Paweł Dziepak
34e66e60c1 main: disable thrift by default
Fixes #205.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-22 09:48:44 +02:00
Calle Wilund
a10745cf0e Commitlog: Delay timer by period/ncpus for each cpu
To avoid having all shards doing sync at the same time.
2015-09-21 13:30:35 +02:00
Calle Wilund
dcabf8c1d2 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.

v2: Fixed timestamp not being reset on reserve acquire
2015-09-21 13:04:39 +02:00
Pekka Enberg
6cef7d8270 db/schema_tables: Fix calculate_schema_digest()
map_reduce() can run the reducer out-of-order which breaks the MD5 hash.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

Fixes #357. [tgrabiec]
2015-09-21 11:51:17 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Avi Kivity
dcdc925b86 Revert "Commitlog: Pre-allocate "reserve" segments"
This reverts commit cbf3b63853, due to
reports of increased latency (instead of the opposite).
2015-09-19 09:26:39 +03:00
Calle Wilund
cbf3b63853 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
2015-09-17 19:54:28 +03:00
Calle Wilund
b512192b3b Commitlog: Fix some timing/latency issues with sync
Refs #356

* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
  since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
2015-09-16 20:07:25 +03:00
Calle Wilund
d42ff89e83 Config: Promote logging of unhandled options to warning
Fixes #222
2015-09-16 15:43:53 +03:00
Calle Wilund
bf727b2272 config.cc : add logging of unset attributes
Helps checking for missing stuff in scylla.yaml
2015-09-16 15:43:35 +03:00
Calle Wilund
8172717ba0 config.hh : update some default values to match scylla.conf 2015-09-16 15:43:35 +03:00
Calle Wilund
04562b23b4 commitlog_replayer: More correct fix for reordering issue in replay
* Removes previous, accidental fix that got committed.
* Instead just do not give RP:s to replay mutations. This is same as in Origin,
  and just as/more correct, since we intend to flush the data to sstables
  asap anyway
2015-09-16 15:41:17 +03:00
Avi Kivity
cab2148141 Merge "partial sstable handling" from Raphael
closes issue #75.
2015-09-13 12:03:50 +03:00
Gleb Natapov
17e54d0604 add logger for consistency level calculation 2015-09-13 11:59:17 +03:00
Raphael S. Carvalho
c729ea36e1 commitlog: guard commit log replay against reordering
After killing scylla in the middle of a write, the next scylla
instance failed to finish commit log replay, showing the following
error message:

scylla: core/future.hh:448: void promise<T>::set_value(A&& ...)
[with A = {}; T = {}]: Assertion `_state' failed.

After a long debug session, I figured out that check_valid_rp() was
triggering the exception replay_position_reordered_exception, which
means replay position reordering.

Looking at 8b9a63a3c6, I noticed that database::apply is guarded
against reodering, but commitlog replay code is not.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-12 06:17:14 -03:00
Gleb Natapov
04d2bef55b give preference to local data during query
Until dynamic snitch is implemented this is better than nothing.

Fixes #322
2015-09-10 15:45:20 +03:00
Pekka Enberg
1f7fa18970 db/schema_tables: Fix create keyspace notification
We need to send out the notification for all created keyspaces, not just
for the first one.

Spotted during code inspection.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-10 10:35:59 +03:00
Avi Kivity
6ccb0b15b5 Merge "Move the API configuration from command line to configuration" from Amnon
"It moves the API configuration from the command line argument to the general
config, it also move the api-doc directory to be configurable instead of hard
coded."
2015-09-09 12:35:03 +03:00
Gleb Natapov
df468504b6 schema_table: convert code to use distributed<storage_proxy> instead of storage_proxy&
All database code was converted to is when storage_proxy was made
distributed, but then new code was written to use storage_proxy& again.
Passing distributed<> object is safer since it can be passed between
shards safely. There was a patch to fix one such case yesterday, I found
one more while converting.
2015-09-09 10:19:30 +03:00
Calle Wilund
d46a95242a Config: Fix type where alias destination was copied instead of referenced
Fixes #310

Missing '&'.
(And no, cannot make the type non-copyable, since we want to copy config
 objects).
2015-09-08 16:54:04 +03:00
Calle Wilund
456246dfd5 Commitlog: Add a gate + shutdown method
* Gate ensures we don't add data into a segment after close
* Shutdown closes all segments for business and prohibits new segments
2015-09-08 11:53:41 +02:00
Calle Wilund
d666c747e3 Commitlog: Just add some more verbosity 2015-09-08 11:16:38 +02:00
Avi Kivity
a95d3f9cf5 Merge "Commitlog shutdown" from Calle
"Refs #293

* Add a commitlog::sync_all_segments, that explicitly forces all pending
  disk writes
* Only delete segments from disk IFF they are marked clean. Thus on partial
  shutdown or whatnot, even if CL is destroyed (destructor runs) disk files
  not yet clean visavi sstables are preserved and replayable
* Do a sync_all_segments first of all in database::stop.

Exactly what to not stop in main I leave up to others discretion, or at least
another patch."
2015-09-08 11:11:18 +03:00
Tomasz Grabiec
15ae1a92cb Merge branch 'pdziepak/compaction-remove-items/v4' from seastar-dev.git
From Pawel:

This series makes compaction remove items that are no longer items:
 - expired cells are changed into tombstones
 - items covered by higher level tombstones are removed
 - expired tombstones are removed if possible

Fixes #70.
Fixes #71.
2015-09-08 09:23:00 +02:00
Amnon Heiman
8be2ee54aa configuration: Add the API configuration to the general configuration
This adds the API configuration parameters to the configurtion, so it
will be possible to take them from the configuration file or from the
command line.

The following configuration were defined:
api_port
api_address
api_ui_dir
api_doc_dir

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-08 02:56:47 +03:00
Paweł Dziepak
64949e8339 schema: make gc_grace_seconds() return gc_clock::duration
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 21:14:41 +02:00
Calle Wilund
256c0550bf Commitlog: Only delete segments on disk if they are marked clean
For #293 - i.e. allow more or less coherent shutdown/destruction of the
commitlog while retaining disk data.
(tests still clear stuff explicitly).
2015-09-07 20:32:01 +02:00
Calle Wilund
4ed95b7020 Commitlog: Add sync_all_segments()
For #293 - allows explicit flush to disk (not close!) of all active segments
2015-09-07 20:31:59 +02:00
Calle Wilund
d614143f5e Commitlog/database: Fixup series "Commit log flush request on disk overflow"
Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)

Refs #262

Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.

v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
  only fully syncronous over actual call
2015-09-07 21:29:19 +03:00
Avi Kivity
dee9060b12 Merge "Commit log flush request on disk overflow" from Calle
"Fixes #262

Handles CL disk size exceeding configured max size by calling flush handlers
for each dirty CF id / high replay_position mark. (Instead of uncontrolled
delete as previously).

* Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no
   real change, but synced).
* Divide the max disk size by cpus (so sum of all shards == max)
* Abstract flush callbacks in CL
* Handler in DB that initiates memtable->sstable writes when called.

Note that the flush request is done "syncronously" in new_segment() (i.e.
when getting a new segment and crossing threshold). This is however more or
less congruent with Origin, which will do a request-sync in the corresponding
case.
Actual dealing with the request should at least in production code however be
done async, and in DB it is, i.e. we initiate sstable writes. Hopefully
they finish soon, and CL segments will be released (before next segment is
allocated).

If the flush request does _not_ eventually result in any CF:s becoming
clean and segments released we could potentially be issuing flushes
repeatedly, but never more often than on every new segment."
2015-09-07 18:46:48 +03:00
Gleb Natapov
da242146b6 do not pass storage_proxy reference across cpus
storage_proxy instances are per cpu, so they cannot be passed around to
other cpus.
2015-09-07 17:16:29 +02:00
Calle Wilund
fdb921afb2 Commitlog: Add flushing of segment CF:s on disk overflow
* Do not throw away commitlog segments on disk size overflow. 
  Issue a flush request (i.e. calculate RP we want to free unto, 
  and for all dirty CF:s, do a request).
  "Abstracted" as registerable callback. I.e. DB:s responsibility 
  to actually do something with it.
2015-09-07 13:21:43 +02:00
Calle Wilund
31f2dcb342 Config: change commilog max size on disk to be in sync with scylla.yaml 2015-09-07 13:13:51 +02:00
Calle Wilund
841dd32a8a Commitlog: divide max on-disk-size by num cpus
To try to keep the resulting limit as configured
2015-09-07 13:13:46 +02:00