Usage for end-users

Usage for end-users

The main goal of DataDeps.jl is to simplify life for the user. They should just forget about the data their package needs.

Moving Data

Moving data is a great idea. DataDeps.jl is in favour of moving data When data is automatically downloaded it will almost always go to the same location. The first (existant, writable) directory on your DATADEPS_LOADPATH. Which by-default is ~/.julia/datadeps/. (If you delete this, it will go to another location But you can move them from there to anywhere in the DATADEPS_LOADPATH. (See below)

If you have a large chunk of data that everyone in your lab is using (e.g. a 1TB video corpora), you probably want to shift it to a shared area, like /usr/share/datadeps. Even if you don't have write permissions, you can have a sysadmin move it, and so long as you still have read permission DataDeps.jl will find it and use it for you.

The Load Path

The Load Path is the list of paths that DataDeps.jl looks in when trying to resolve a data dependency. If it doesn't find the data in any of them it will download the data.

It has 3 sources:

In general it should by default include just about anywhere you might want to put the data. If it doesn't, please file an issue. (Unless your location is super-specific, e.g. /MyUniName/student/commons/datadeps). As mentioned you can add things to the load path by setting the environment variable DATADEPS_LOAD_PATH. You can also make symlinks from the locations on the loadpath to other locations where the data really is, if you'ld rather do that.

When loading data the load path is searched in order for a readable folder of the right now. When saving data is it is searched in order, skipping the package load path, for a writable directory. Simple way to avoid part of the standard loadpath being used for saving is to delete it, or make it unwritable. You can (and should when desired) move things around between any folder in the load path without redownloading.

Unix Standard Load Path

For the user oxinabox

/home/wheel/oxinabox/.julia/datadeps
/home/wheel/oxinabox/datadeps
/scratch/datadeps
/staging/datadeps
/usr/share/datadeps
/usr/local/share/datadeps

Windows Standard Load Path

For the user oxinabox, when using JuliaPro 0.6.2.1, on windows 7. (Other configurations should be fairly similar).

C:\Users\oxinabox\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\datadeps
C:\Users\oxinabox\datadeps
C:\Users\oxinabox\AppData\Roaming\datadeps
C:\Users\oxinabox\AppData\Local\datadeps
C:\ProgramData\datadeps
C:\Users\Public\datadeps

Having multiple copies of the same DataDir

You probably don't want to have multiple copies of a DataDir with the same name. DataDeps.jl will try to handle it as gracefuly as it can. But having different DataDep under the same name, is probably going to lead to packages loading the wrong one. Except if they are (both) located in their packages deps/data folder.

By moving a package's data dependency into its package directory under deps/data, it becomes invisible except to that package. For example ~/.julia/v0.6/EXAMPLEPKG/deps/data/EXAMPLEDATADEP/, for the package EXAMPLEPKG, and the datadep EXAMPLEDATADEP.

Ideally though you should probably raise an issue with the package maintainers and see if one (or both) of them want to change the DataDep name.

Note also when it comes to file level loading, e.g. datadep"Name/subfolder/file.txt", DataDeps does not check all folders with that Name (if you have multiples). If the file is not in the first folder it finds you will be presented with the recovery dialog, from which the easiest option is to select to delete the folder and retry, since that will result in it checking the second folder (as the first one does not exist).

Configuration

Currently configuration is done via Enviroment Variables. It is likely to stay that way, as they are also easy to setup in CI tools. You can set these in the .juliarc file using the ENV dictionary if you don't want to mess up your .profile. However, most people shouldn't need to. DataDeps.jl tries to have very sensible defaults.

- This is used by the default `fetch_method` and when implementing custom methods it is good to respect it.
- default: `5` (seconds) usually; `Inf` (i.e. no updates) if `DATADEPS_ALWAYS_ACCEPT` is set.