API Reference

API Reference

Public API

DataDep
ManualDataDep
register
@datadep_str
download

Helpers

DataDeps.unpackFunction.
unpack(f; keep_originals=false)

Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

Internal

DataDep(
    name::String,
    message::String,
    remote_path::Union{String,Vector{String}...},
    [checksum::Union{String,Vector{String}...},]; # Optional, if not provided will generate
    # keyword args (Optional):
    fetch_method=fetch_http # (remote_filepath, local_directory)->local_filepath
    post_fetch_method=identity # (local_filepath)->Any
)

Required Fields

  • *Name**: the name used to refer to this datadep, coresponds to a folder name where it will be stored
    • It can have spaces or any other character that is allowed in a Windows filestring (which is a strict subset of the restriction for unix filenames).
  • Message: A message displayed to the user for they are asked if they want to downloaded it.
    • This is normally used to give a link to the original source of the data, a paper to be cited etc.
  • remote_path: where to fetch the data from. Normally a string or strings) containing an URL
    • This is usually a string, or a vector of strings (or a vector of vector... see Recursive Structure below)

Optional Fields

  • checksum this is very flexible, it is used to check the files downloaded correctly
    • By far the most common use is to just provide a SHA256 sum as a hex-string for the files
    • If not provided, then a warning message with the SHA256 sum is displayed. This is to help package devs workout the sum for there files, without using an external tool.
    • If you want to use a different hashing algorithm, then you can provide a tuple (hashfun, targethex)
      • hashfun should be a function which takes an IOStream, and returns a Vector{UInt8}.
      - Such as any of the functions from [SHA.jl](https://github.com/staticfloat/SHA.jl), eg `sha3_384`, `sha1_512`
      - or `md5` from [MD5.jl](https://github.com/oxinabox/MD5.jl)
  • If you want to use a different hashing algorithm, but don't know the sum, you can provide just the hashfun and a warning message will be displayed, giving the correct tuple of (hashfun, targethex) that should be added to the registration block.
- If you don't want to provide a checksum,  because your data can change pass in the type `Any` which will suppress the warning messages. (But see above warnings about "what if my data is dynamic")
- Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are `xor`ed to get the target hash. (See [Recursive Structure](Recursive Structure) below)
  • fetch_method=fetch_http a function to run to download the files.

    • Function should take 2 parameters (remotepath, local_directory), and must return a local filepath
    • It is responsible for determining what the local filename should be
    • Change this to change the transfer protocol, for example to use an auth'ed connection.
    • Default fetch_http is a wrapper around Base.download which invokes commandline download tools.
    • Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See Recursive Structure below)
    • Very few people will need to override this if they are just downloading public HTTP files.
  • post_fetch_method a function to run after the files have download

    • Should take the local filepath as its first and only argument. Can return anything.
    • Default is to do nothing.
    • Can do what it wants from there, but most likes wants to extract the file into the data directory.
    • towards this end DataDeps includes a command: unpack which will extract an compressed folder, deleting the original.
    • It should be noted that it post_fetch_method runs from within the data directory
      • which means operations that just write to the current working directory (like rm or mv or run(`SOMECMD`)) just work.
      • You can call cwd() to get the the data directory for your own functions. (Or dirname(local_filepath))
    • Can take a vector of methods, being one for each file, or a single method, in which case that ame method is applied to all of the files. (See Recursive Structure in the README.md)
DataDeps.resolveMethod.
resolve("name/path", @__FILE__)

Is the function that lives directly behind the datadep"name/path" macro. If you are working the the names of the datadeps programatically, and don't want to download them by mistake; it can be easier to work with this function.

Note though that you must include @__FILE__ as the second argument, as DataDeps.jl uses this to allow reading the package specific deps/data directory. Advanced usage could specify a different file or nothing, but at that point you are on your own.

DataDeps.resolveMethod.
resolve(datadep, inner_filepath, calling_filepath)

Returns a path to the folder containing the datadep. Even if that means downloading the dependancy and putting it in there.

 - `inner_filepath` is the path to the file within the data dir
 - `calling_filepath` is a path to the file where this is being invoked from

This is basically the function the lives behind the string macro datadep"DepName/inner_filepath".

DataDeps.unpackMethod.
unpack(f; keep_originals=false)

Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

`datadep"Name"` or `datadep"Name/file"`

Use this just like you would a file path, except that you can refer by name to the datadep. The name alone will resolve to the corresponding folder. Even if that means it has to be downloaded first. Adding a path within it functions as expected.

DisabledError For when functionality that is disabled is attempted to be used

For when there is no valid location available to save to.

For when a users has selected to abourt

Base.downloadMethod.
Base.download(
    datadep::DataDep,
    localdir;
    remotepath=datadep.remotepath,
    skip_checksum=false,
    i_accept_the_terms_of_use=nothing)

A method to download a datadep. Normally, you do not have to download a data dependancy manually. If you simply cause the string macro datadep"DepName", to be exectuted it will be downloaded if not already present.

Invoking this download method manually is normally for purposes of debugging, As such it include a number of parameters that most people will not want to use.

  • localdir: this is the local directory to save to.
  • remotepath: the remote path to fetch the data from, use this e.g. if you can't access the normal path where the data should be, but have an alternative.
  • skip_checksum: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
  • i_accept_the_terms_of_use: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using annother system to get confirmation of acceptance.
    • For automation perposes you can set the enviroment variable DATADEPS_ALWAYS_ACCEPT
    • If not set, and if DATADEPS_ALWAYS_ACCEPT is not set, then the user will be prompted.
    • Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

If you need more control than this, then your best bet is to construct a new DataDep object, based on the original, and then invoke download on that.

DataDeps._resolveMethod.

The core of the resolve function without any user friendly file stuff, returns the directory

accept_terms(datadep, localpath, remotepath, i_accept_the_terms_of_use)

Ensurses the user accepts the terms of use; otherwise errors out.

DataDeps.checksumMethod.
checksum(hasher=sha2_256, filename[/s])

Executes the hasher, on the file/files, and returns a UInt8 array of the hash. xored if there are multiple files

checksum_pass(hash, fetched_path)

Ensures the checksum passes, and handles the dialog with use user when it fails.

determine_save_path(name)

Determines the location to save a datadep with the given name to.

ensure_download_permitted()

This function will throw an error if download functionality has been disabled. Otherwise will do nothing.

DataDeps.env_boolFunction.
env_bool(key)

Checks for an enviroment variable and fuzzy converts it to a bool

DataDeps.env_listFunction.
env_list(key)

Checks for an enviroment variable and converts it to a list of strings, sperated with a colon

fetch_http(remotepath, localdir; update_period=5)

Pass in a HTTP[/S] URL and a directory to save it to, and it downloads that file, returing the local path. This is using the HTTP protocol's method of defining filenames in headers, if that information is present. Returns the localpath that it was donwloaded to.

update_period controls how often to print the download progress to the log. It is expressed in seconds. It is printed at @info level in the log. By default it is once per second, though this depends on configuration

handle_missing(datadep::DataDep, calling_filepath)::String

This function is called when the datadep is missing.

DataDeps.input_boolFunction.
bool_input

Prompted the user for a yes or no.

input_choice

Prompted the user for one of a list of options

input_choice

Prompts the user for one of a list of options. Takes a vararg of tuples of Letter, Prompt, Action (0 argument function)

Example:

input_choice(
    ('A', "Abort -- errors out", ()->error("aborted")),
    ('X', "eXit -- exits normally", ()->exit()),
    ('C', "Continue -- continues running", ()->nothing)),
)
is_valid_name(name)

This checks if a datadep name is valid. This basically means it must be a valid folder name on windows.

list_local_paths( name|datadep, [calling_filepath|module|nothing])

Lists all the local paths to a given datadep. This may be an empty list

preferred_paths(calling_filepath; use_package_dir=true)

returns the datadeps loadpath plus if callingfilepath is provided and use_package_dir=true and is currently inside a package directory then it also includes the path to the dataseps in that folder.

progress_update_period()

Returns the period between updated being logged on the progress. This is used by the default fetch_method and is generally a good idea to use it in any custom fetch method, if possible

If a vector of paths is provided and a vector of hashing methods (of any form) then they are all required to match.

Providing only a hash string, results in defaulting to sha2_256, with that string being the target

If only a function is provided then assume the user is a developer, wanting to know what hash-line to add to the Registration line.

If nothing is provided then assume the user is a developer, wanting to know what sha2_256 hash-line to add to the Registration line.

run_checksum(checksum, path)

THis runs the checksum on the files at the fetched_path. And returns true or false base on if the checksum matchs. (always true if no target sum given) It is kinda flexible and accepts different kinds of behavour to give different kinds of results.

If path (the second parameter) is a Vector, then unless checksum is also a Vector, the result is the xor of the all the file checksums.

Use Any to mark as not caring about the hash. Use this for data that can change

DataDeps.run_fetchMethod.
run_fetch(fetch_method, remotepath, localdir)

executes the fetchmethod on the given remotepath, into the local directory and local paths. Performs in (async) parallel if multiple paths are given

run_post_fetch(post_fetch_method, fetched_path)

executes the postfetchmethod on the given fetched path, Performs in (async) parallel if multiple paths are given

DataDeps.splitpathMethod.
splitpath(path)

The opposite of joinpath, splits a path unto each of its directories names / filename (for the last).

try_determine_load_path(name)

Trys to find a local path to the datadep with the given name. If it fails then it returns nothing.

try_determine_package_datadeps_dir(filepath)

Takes a path to a file. If that path is in a package's folder, Then this returns a path to the deps/data dir for that package (as a Nullable). Which may or may not exist. If not in a package returns null

DataDeps.uv_accessMethod.
uv_access(path, mode)

Check access to a path. Returns 2 results, first an error code (0 for all good), and second an error message. https://stackoverflow.com/a/47126837/179081