Commit Graph

464 Commits

Author SHA1 Message Date
Tomasz Grabiec
19d7d30e67 Replace references to 'urchin' with 'scylla' 2015-10-19 11:08:05 +03:00
Raphael S. Carvalho
a21af32eed db: do not ignore compaction strategy class
When building the in-memory schema for a column family, we were
ignoring compaction strategy class because of a bug in the
existing code. Example: suppose that you create a column family
with leveled compaction strategy. This option would be ignored
and the default strategy (size-tiered) would be used instead.
Found this problem while working on leveled compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-18 11:06:37 +03:00
Glauber Costa
e99e418238 schema_tables: make sure CF directory exists upon creation
In Cassandra, when you create a new column family, a directory for it
immediately appears under the KS directory.

In the past, we have made a decision to delay that creation until the first
SSTable is created, which works well in general.

There is a problem, however, for backup restoration: the standard procedure to
call loadNewSSTables is to do that in an empty directory. But the directory
simply won't be there until we create the first SSTable: bummer!

In the current incarnation of the code in schema_tables.cc, there is already
some code that runs on CPU0 only. That is a perfect place for the directory
creation. So let's do it.

After this patch, a directory for the CF appears right after the CF creation.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Avi Kivity
849464670c commitlog: make new segments more xfs-friendly
xfs doesn't like writes beyond eof (exactly at eof is fine), and due
to continuation reordering, we sometimes do that.

Fix by pre-truncating the segment to its maximum size.
2015-10-14 17:32:59 +03:00
Calle Wilund
206acd8b5b commitlog: Make reader handle pre-allocated files
Silently ignore, and assume eof if reading zeroed file or chunk header data
Reading entries already deal with this.
2015-10-14 17:32:23 +03:00
Calle Wilund
2729d5dd71 commitlog: ensure file size remains <= max_size
Re-check file size overflow after each cycle() call (new buffer),
otherwise we could write more, in the case we are storing a mutation
larger than current buffer size (current pos + sizeof(mut) < max_size, but
after cycle required by sizeof(mut) > buf_remain, the former might not be
true anymore.
2015-10-14 17:32:22 +03:00
Avi Kivity
e252475e67 Merge "locator: Adding EC2Snitch" from Vlad
"This series adds EC2Snich.

Since both GossipingPropertyFileSnitch and EC2SnitchXXX snitches family
are using the same property file it was logical to share the corresponding
code. Most of this series does just that... "
2015-10-11 14:55:26 +03:00
Glauber Costa
b2fef14ada do not calculate truncation time independently
Currently, we are calculating truncated_at during truncate() independently for
each shard. It will work if we're lucky, but it is fairly easy to trigger cases
in which each shard will end up with a slightly different time.

The main problem here, is that this time is used as the snapshot name when auto
snapshots are enabled. Previous to my last fixes, this would just generate two
separate directories in this case, which is wrong but not severe.

But after the fix, this means that both shards will wait for one another to
synchronize and this will hang the database.

Fix this by making sure that the truncation time is calculated before
invoke_on_all in all needed places.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-09 17:17:11 +03:00
Vlad Zolotarov
de6cf8db51 db::config: add get_conf_dir()
This function returns the directory containing the configuration
files. It takes into an account the evironment variables as follows:
   - If SCYLLA_CONF is defines - this is the directory
   - else if SCYLLA_HOME is defines, then $SCYLLA_HOME/conf is the directory
   - else "conf" is a directory, namely the configuration files should be
     looked at ./conf

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Updated get_conf_dir() description.
2015-10-08 20:57:11 +03:00
Pekka Enberg
95012793e5 db/schema_tables: Wire up drop keyspace notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-08 13:10:48 +02:00
Calle Wilund
42c086a5cd batchlog_manager: Fixup includes + exception handling
* Fix exception handling in batch loop (report + still re-arm)
* Cleanup seastar include reference style
2015-10-07 17:06:34 +03:00
Calle Wilund
a4c14d3d1d batchlog_manager: Add hint of which cpu timer callback is running on 2015-10-07 14:57:55 +02:00
Calle Wilund
b46496da34 batchlog_manager: Rename logger
* More useful/referrable on command line (--log*)
* Matches class name (though not origin)
2015-10-07 14:30:09 +02:00
Calle Wilund
6f94a3bdad batchlog_manager: Use gate instead of semaphore
Since that exists now.
2015-10-07 14:30:09 +02:00
Calle Wilund
874da0eb67 batchlog_manager: Run timer loop on only one shard
Since replay is a "node global" operation, we should not attempt to
do it in parallel on each shard. It will just overlap/interfere.
Could just run this on cpu 0 or but since this _could_ be a
lengty operation, each timer callback is round-robined shards just in case...
2015-10-07 14:30:09 +02:00
Calle Wilund
246e8e24f2 replay_position: Make <= comparator simpler and cleaner 2015-10-07 14:34:22 +03:00
Calle Wilund
a66c22f1ec commitlog_replayer: Acquire truncation RP:s per replayed shard
I.e. get them in bulk and fill in for all shards
2015-10-07 09:00:22 +02:00
Calle Wilund
17bd18b59c commitlog_replayer: Add logging message for exceptions in multi-file recover 2015-10-07 08:59:54 +02:00
Calle Wilund
3f1fa77979 commitlog_replayer: Fix broken comparison
A commitlog entry should be ignored if its position is <= highest recorded
position, not <.
2015-10-07 08:59:53 +02:00
Calle Wilund
271eb3ba02 replay_position: Add <= comparator 2015-10-07 08:59:53 +02:00
Calle Wilund
6b0ab79ecb system_keyspace: Keep per-shard truncation records
Fixes  #423
* CF ID now maps to a truncation record comprised of a set of 
  per-shard RP:s and a high-mark timestamp
* Retrieving RP:s are done in "bulk"
* Truncation time is calculated as max of all shards.

This version of the patch will accept "old" truncation data, though the 
result of applying it will most likely not be correct (just one shard)

Record is still kept as a blob, "new" format is indicated by 
record size.
2015-10-07 08:59:52 +02:00
Calle Wilund
199b72c6f3 commitlog: fix reader "offset" handling broken + ensure exceptions propagates
Must ensure we find a chunk/entry boundary still even when run
with a start offset, since file navigation in chunk based.
Was not observed as broken previously because
1.) We did not run with offsets
2.) The exception never reached caller.

Also make the reader silently ignore empty files.
2015-10-07 08:54:49 +02:00
Calle Wilund
024041c752 commitlog: make log message slightly more informative/correct 2015-10-07 08:54:49 +02:00
Pekka Enberg
5878f62b18 db/schema_tables: Clean up indentation
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
1f9e769dd3 db/schema_tables: Remove obsolete ifdef'd code
Remove ifdef'd code that we won't be converting to C++ because of design
differences.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
6e304cd58c db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.

Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
5d9d1e28cb db/schema_tables: Implement make_drop_keyspace_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
633279415d db/schema_tables: Fix merge_tables() to actually drop tables
When we query schema tables after we have applied a delete mutation, the
dropped table does not exist in the "after" result set. Fix the
merge_tables() algorithm to take that into account.

Makes merge_tables() really call to database::drop_column_family() when
a table is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
82d20dba65 db/schema_tables: Implement make_drop_table_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b89b70daa8 db/schema_tables: Wire up drop column notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
0651ab6901 database: Futurize drop_column_family() function
Futurize drop_column_family() so that we can call truncate() from it.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b74a9d99d5 db/schema_tables: Fix UTF-8 serialization
Use the utf8_type to serialize strings instead of using to_bytes().

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-05 09:26:15 +02:00
Calle Wilund
7856d7fe02 config: Change "auto_snapshot" to "used" 2015-09-30 09:09:42 +02:00
Calle Wilund
b3c95ce42d system_keyspace: Change truncation record method to use context qp
Align with rest of file (for better or worse). This allows calls from
entity without query_processor handy (i.e. storage_proxy).

Added "minimal" setup method for the "global" state, to facilitate
tests. Doing a full setup either in cql_test_env or after it is created
breaks badly. (Not sure why). So quick workaround.

Updated the current two users (batchlog_manager and commitlog_replayer)
callsites to conform.
2015-09-30 09:09:41 +02:00
Calle Wilund
3abd8b38b6 query_context: Expose query_processor (local) 2015-09-30 09:09:41 +02:00
Avi Kivity
0ec0e32014 Merge "ommitlog: preallocate segments" from Calle
"Modified version of the initial patch (which was reverted), further
reducing the possible delay states in CL allocation and segment management."
2015-09-29 17:02:54 +03:00
Pekka Enberg
f43f0d6f04 keys: Add compound_wrapper::from_singular()
Clean up code by adding a from_singular() helper function to compound
wrapper and use it in.
2015-09-28 16:29:44 +02:00
Calle Wilund
4941d91063 Commitlog: add some more verbosity 2015-09-22 12:57:33 +02:00
Paweł Dziepak
34e66e60c1 main: disable thrift by default
Fixes #205.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-22 09:48:44 +02:00
Calle Wilund
a10745cf0e Commitlog: Delay timer by period/ncpus for each cpu
To avoid having all shards doing sync at the same time.
2015-09-21 13:30:35 +02:00
Calle Wilund
dcabf8c1d2 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.

v2: Fixed timestamp not being reset on reserve acquire
2015-09-21 13:04:39 +02:00
Pekka Enberg
6cef7d8270 db/schema_tables: Fix calculate_schema_digest()
map_reduce() can run the reducer out-of-order which breaks the MD5 hash.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

Fixes #357. [tgrabiec]
2015-09-21 11:51:17 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Avi Kivity
dcdc925b86 Revert "Commitlog: Pre-allocate "reserve" segments"
This reverts commit cbf3b63853, due to
reports of increased latency (instead of the opposite).
2015-09-19 09:26:39 +03:00
Calle Wilund
cbf3b63853 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
2015-09-17 19:54:28 +03:00
Calle Wilund
b512192b3b Commitlog: Fix some timing/latency issues with sync
Refs #356

* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
  since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
2015-09-16 20:07:25 +03:00
Calle Wilund
d42ff89e83 Config: Promote logging of unhandled options to warning
Fixes #222
2015-09-16 15:43:53 +03:00
Calle Wilund
bf727b2272 config.cc : add logging of unset attributes
Helps checking for missing stuff in scylla.yaml
2015-09-16 15:43:35 +03:00
Calle Wilund
8172717ba0 config.hh : update some default values to match scylla.conf 2015-09-16 15:43:35 +03:00
Calle Wilund
04562b23b4 commitlog_replayer: More correct fix for reordering issue in replay
* Removes previous, accidental fix that got committed.
* Instead just do not give RP:s to replay mutations. This is same as in Origin,
  and just as/more correct, since we intend to flush the data to sstables
  asap anyway
2015-09-16 15:41:17 +03:00