About¶
Filetracker is a module which provides a shared storage for files together with some extra metadata.
It was designed with the intent to be used along with a relational database in cases where large files need to be stored and accessed from multiple locations, but storing them as blobs in the database is not suitable.
Filetracker base supports caching of files downloaded from the remote master store.
Filetracker API allows versioning of the stored files, but its implementation is optional and not provided by default store classes.
Files, names and versions¶
A file may contain arbitrary data. Each file has a name, which looks like an absolute filesystem path (components separated by slashes and the first symbol in the filename must be a slash). Filetracker does not support folders explicitly. At the moment you may assume that a file in filetracker is identified by name which by convention looks like a filesystem path. In the future we may make use of this fact, so please obey.
Many methods accept or return versioned names, which look like regular
names with version number appended, separated by @
. For those methods,
passing an unversioned name usually means “the latest version of that file”.
Configuration and usage¶
Probably the only class you’d like to know and use is Client
.
-
class
filetracker.
Client
(local_store='auto', remote_store='auto', lock_manager='auto', cache_dir=None, remote_url=None, locks_dir=None)[source]¶ The main filetracker client class.
The client instance can be built is one of several ways. The easiest one is to just call the constructor without arguments. In this case the configuration is taken from the environment variables:
FILETRACKER_DIR
- the folder to use as the local cache; if not specified,
~/.filetracker-store
is used. FILETRACKER_URL
- the URL of the filetracker server; if not present, the constructed client is a stand-alone local client, which stores the files and metadata locally — this can be safely used by multiple processes on the same machine, too.
Another way to create a client is to pass these values as constructor arguments —
remote_url
andcache_dir
.If you are the power-user, you may create the client by manually passing
local_store
andremote_store
to the constructor (see Filetracker Cache Cleaner).-
delete_file
(name)[source]¶ Deletes the file identified by
name
along with its metadata.The file is removed from both the local store and the remote store.
-
file_size
(name, force_refresh=False)[source]¶ Returns the size of the file.
For efficiency this operation does not use locking, so may return inconsistent data. Use it for informational purposes.
-
file_version
(name)[source]¶ Returns the newest available version number of the file.
If the remote store is configured, it is queried, otherwise the local version is returned. It is assumed that the remote store always has the newest version of the file.
If version is a part of
name
, it is ignored.
-
get_file
(name, save_to, add_to_cache=True, force_refresh=False, _lock_exclusive=False)[source]¶ Retrieves file identified by
name
.The file is saved as
save_to
. Ifadd_to_cache
isTrue
, the file is added to the local store. Ifforce_refresh
isTrue
, local cache is not examined if a remote store is configured.If a remote store is configured, but
name
does not contain a version, the local data store is not used, as we cannot guarantee that the version there is fresh.Local data store implemented in
LocalDataStore
tries to not copy the entire file tosave_to
if possible, but instead uses hardlinking. Therefore you should not modify the file if you don’t want to totally blow something.This method returns the full versioned name of the retrieved file.
-
get_stream
(name, force_refresh=False, serve_from_cache=False)[source]¶ Retrieves file identified by
name
in streaming mode.Works like
get_file()
, except that returns a tuple (file-like object, versioned name).When both remote_store and local_store are present, serve_from_cache can be used to ensure that the file will be downloaded and served from a local cache. If a full version is specified and the file exists in the cache a file will be always served locally.
-
list_local_files
()[source]¶ Returns list of all stored local files.
Each element of this list is of
DataStore.FileInfoEntry
type.
-
put_file
(name, filename, to_local_store=True, to_remote_store=True)[source]¶ Adds file
filename
to the filetracker under the namename
.If the file already exists, a new version is created. In practice if the store does not support versioning, the file is overwritten.
The file may be added to local store only (if
to_remote_store
isFalse
), to remote store only (ifto_local_store
isFalse
) or both. If only one store is configured, the values ofto_local_store
andto_remote_store
are ignored.Local data store implemented in
LocalDataStore
tries to not directly copy the data to the final cache destination, but uses hardlinking. Therefore you should not modify the file in-place later as this would be disastrous.
If you write tests, you may be also interested in
filetracker.dummy.DummyClient
.
Filetracker server¶
At some point you probably want to run a filetracker server, so that more than one machine can share the store. Just do:
$ filetracker-server --help
This script can be used to start the metadata and file servers with minimal effort.
Using filetracker from the shell¶
No programmer can live without a way to fiddle with filetracker from the shell:
$ filetracker --help
Filetracker Cache Cleaner¶
For usage, please run:
$ filetracker-cache-cleaner --help
-
class
filetracker.cachecleaner.
CacheCleaner
(cache_size_limit, glob_cache_dirs, scan_interval=datetime.timedelta(0, 600), percent_cleaning_level=50.0)[source]¶ Tool for periodically cleaning cache of the file tracker. Designed to work as a daemon.
Cache cleaner is run by calling method
run()
. It supports multiple instances ofClient
. Configuration is passed as constructors parameters:Parameters: - cache_size_limit (int) – soft limit for sum of logical files size
- glob_cache_dirs (iterable) – list of paths to
filetracker.Client
cache directories as glob expressions - scan_interval (datetime.timedelta) – interval specifying how often scan the disk and optionally clean cache
- percent_cleaning_level (float) – how many percent of
cache_size_limit
of newest cache files do not delete during cleaning cache
Cache cleaner runs the following algorithm:
- Ask each client (specified in constructor by cache directory) to list all stored files. This is file index.
- Analyze file index - check whether cache cleaner should clean the cache and what files exactly.
- Clean cache if necessary.
- Wait time specified in constructor.
- Go to step 1.
Files are being deleting from the oldest to newer ones taking into account modification time. If files have the same modification time, then file with greater size is being deleted before the second one.
-
class
FileIndexEntry
(file_info, client)¶ Entry for file index.
Associates
DataStore.FileInfoEntry
withfiletracker.Client
which owns given file.Fields:
file_info
instance ofDataStore.FileInfoEntry
client
instance offiletracker.Client
which owns given file
-
client
¶ Alias for field number 1
-
file_info
¶ Alias for field number 0
Internal API Reference¶
-
filetracker.
split_name
(name)[source]¶ Splits a (possibly versioned) name into unversioned name and version.
Returns a tuple
(unversioned_name, version)
, whereversion
may beNone
.
To-dos and ideas¶
- access control
- cache pruning
- support for “directories”: especially ls
- fuse client
- rm