Module

class Module(base, ensure_exists=True)[source]

Bases: object

The class wrapping the directory lookup implementation.

Initialize the module.

Parameters:
  • base (Union[str, Path]) – The base directory for the module

  • ensure_exists (bool) – Should the base directory be created automatically? Defaults to true.

Methods Summary

dump_df(*subkeys, name, obj[, sep, index, ...])

Dump a dataframe to a TSV file with pandas.

dump_json(*subkeys, name, obj[, ...])

Dump an object to a file with json.

dump_pickle(*subkeys, name, obj[, mode, ...])

Dump an object to a file with pickle.

dump_rdf(*subkeys, name, obj[, format, ...])

Dump an RDF graph to a file with rdflib.

dump_xml(*subkeys, name, obj[, open_kwargs, ...])

Dump an XML element tree to a file with lxml.

ensure(*subkeys, url[, name, force, ...])

Ensure a file is downloaded.

ensure_csv(*subkeys, url[, name, force, ...])

Download a CSV and open as a dataframe with pandas.

ensure_custom(*subkeys, name[, force])

Ensure a file is present, and run a custom create function otherwise.

ensure_excel(*subkeys, url[, name, force, ...])

Download an excel file and open as a dataframe with pandas.

ensure_from_google(*subkeys, name, file_id)

Ensure a file is downloaded from Google Drive.

ensure_from_s3(*subkeys, s3_bucket, s3_key)

Ensure a file is downloaded.

ensure_gunzip(*subkeys, url[, name, force, ...])

Ensure a tar.gz file is downloaded and unarchived.

ensure_json(*subkeys, url[, name, force, ...])

Download JSON and open with json.

ensure_json_bz2(*subkeys, url[, name, ...])

Download BZ2-compressed JSON and open with json.

ensure_open(*subkeys, url[, name, force, ...])

Ensure a file is downloaded and open it.

ensure_open_bz2(*subkeys, url[, name, ...])

Ensure a BZ2-compressed file is downloaded and open a file inside it.

ensure_open_gz(*subkeys, url[, name, force, ...])

Ensure a gzipped file is downloaded and open a file inside it.

ensure_open_lzma(*subkeys, url[, name, ...])

Ensure a LZMA-compressed file is downloaded and open a file inside it.

ensure_open_sqlite(*subkeys, url[, name, ...])

Ensure and connect to a SQLite database.

ensure_open_sqlite_gz(*subkeys, url[, name, ...])

Ensure and connect to a SQLite database that's gzipped.

ensure_open_tarfile(*subkeys, url, inner_path)

Ensure a tar file is downloaded and open a file inside it.

ensure_open_zip(*subkeys, url, inner_path[, ...])

Ensure a file is downloaded then open it with zipfile.

ensure_pickle(*subkeys, url[, name, force, ...])

Download a pickle file and open with pickle.

ensure_pickle_gz(*subkeys, url[, name, ...])

Download a gzipped pickle file and open with pickle.

ensure_rdf(*subkeys, url[, name, force, ...])

Download a RDF file and open with rdflib.

ensure_tar_df(*subkeys, url, inner_path[, ...])

Download a tar file and open an inner file as a dataframe with pandas.

ensure_tar_xml(*subkeys, url, inner_path[, ...])

Download a tar file and open an inner file as an XML with lxml.

ensure_untar(*subkeys, url[, name, ...])

Ensure a tar file is downloaded and unarchived.

ensure_xml(*subkeys, url[, name, force, ...])

Download an XML file and open it with lxml.

ensure_zip_df(*subkeys, url, inner_path[, ...])

Download a zip file and open an inner file as a dataframe with pandas.

ensure_zip_np(*subkeys, url, inner_path[, ...])

Download a zip file and open an inner file as an array-like with numpy.

from_key(key, *subkeys[, ensure_exists])

Get a module for the given directory or one of its subdirectories.

join(*subkeys[, name, ensure_exists])

Get a subdirectory of the current module.

joinpath_sqlite(*subkeys, name)

Get an SQLite database connection string.

load_df(*subkeys, name[, read_csv_kwargs])

Open a pre-existing CSV as a dataframe with pandas.

load_json(*subkeys, name[, open_kwargs, ...])

Open a JSON file json.

load_pickle(*subkeys, name[, mode, ...])

Open a pickle file with pickle.

load_pickle_gz(*subkeys, name[, mode, ...])

Open a gzipped pickle file with pickle.

load_rdf(*subkeys[, name, parse_kwargs])

Open an RDF file with rdflib.

load_xml(*subkeys, name[, parse_kwargs])

Load an XML file with lxml.

module(*subkeys[, ensure_exists])

Get a module for a subdirectory of the current module.

open(*subkeys, name[, mode, open_kwargs, ...])

Open a file that exists already.

open_gz(*subkeys, name[, mode, open_kwargs, ...])

Open a gzipped file that exists already.

Methods Documentation

dump_df(*subkeys, name, obj, sep='\\t', index=False, to_csv_kwargs=None)[source]

Dump a dataframe to a TSV file with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • obj (DataFrame) – The dataframe to dump

  • sep (str) – The separator to use, defaults to a tab

  • index – Should the index be dumped? Defaults to false.

  • to_csv_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.DataFrame.to_csv().

Return type:

None

dump_json(*subkeys, name, obj, open_kwargs=None, json_dump_kwargs=None)[source]

Dump an object to a file with json.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • obj (Any) – The object to dump

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • json_dump_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to json.dump().

Return type:

None

dump_pickle(*subkeys, name, obj, mode='wb', open_kwargs=None, pickle_dump_kwargs=None)[source]

Dump an object to a file with pickle.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • obj (Any) – The object to dump

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • pickle_dump_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pickle.dump().

Return type:

None

dump_rdf(*subkeys, name, obj, format='turtle', serialize_kwargs=None)[source]

Dump an RDF graph to a file with rdflib.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • obj (Graph) – The object to dump

  • format (str) – The format to dump in

  • serialize_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to through to rdflib.Graph.serialize().

dump_xml(*subkeys, name, obj, open_kwargs=None, write_kwargs=None)[source]

Dump an XML element tree to a file with lxml.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • obj (ElementTree) – The object to dump

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • write_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to lxml.etree.ElementTree.write().

ensure(*subkeys, url, name=None, force=False, download_kwargs=None)[source]

Ensure a file is downloaded.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

Return type:

Path

Returns:

The path of the file that has been downloaded (or already exists)

ensure_csv(*subkeys, url, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]

Download a CSV and open as a dataframe with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • read_csv_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.read_csv().

Returns:

A pandas DataFrame

Return type:

pandas.DataFrame

ensure_custom(*subkeys, name, force=False, provider, **kwargs)[source]

Ensure a file is present, and run a custom create function otherwise.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The file name.

  • force (bool) – Should the file be re-created, even if the path already exists?

  • provider (Callable[..., None]) – The file provider. Will be run with the path as the first positional argument, if the file needs to be generated.

  • kwargs – Additional keyword-based parameters passed to the provider.

Raises:

ValueError – If the provider was called but the file was not created by it.

Return type:

Path

Returns:

The path of the file that has been created (or already exists)

ensure_excel(*subkeys, url, name=None, force=False, download_kwargs=None, read_excel_kwargs=None)[source]

Download an excel file and open as a dataframe with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • read_excel_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.read_excel().

Return type:

DataFrame

Returns:

A pandas DataFrame

ensure_from_google(*subkeys, name, file_id, force=False, download_kwargs=None)[source]

Ensure a file is downloaded from Google Drive.

Parameters:
Return type:

Path

Returns:

The path of the file that has been downloaded (or already exists)

ensure_from_s3(*subkeys, s3_bucket, s3_key, name=None, client=None, client_kwargs=None, download_file_kwargs=None, force=False)[source]

Ensure a file is downloaded.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • s3_bucket (str) – The S3 bucket name

  • s3_key (Union[str, Sequence[str]]) – The S3 key name

  • name (Optional[str]) – Overrides the name of the file at the end of the S3 key, if given.

  • client (Optional[BaseClient]) – A botocore client. If none given, one will be created automatically

  • client_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to be passed to the client on instantiation.

  • download_file_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to be passed to boto3.s3.transfer.S3Transfer.download_file()

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

Return type:

Path

Returns:

The path of the file that has been downloaded (or already exists)

ensure_gunzip(*subkeys, url, name=None, force=False, autoclean=True, download_kwargs=None)[source]

Ensure a tar.gz file is downloaded and unarchived.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • autoclean (bool) – Should the zipped file be deleted?

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

Return type:

Path

Returns:

The path of the directory where the file that has been downloaded gets extracted to

ensure_json(*subkeys, url, name=None, force=False, download_kwargs=None, open_kwargs=None, json_load_kwargs=None)[source]

Download JSON and open with json.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • json_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to json.load().

Return type:

Any

Returns:

A JSON object (list, dict, etc.)

ensure_json_bz2(*subkeys, url, name=None, force=False, download_kwargs=None, open_kwargs=None, json_load_kwargs=None)[source]

Download BZ2-compressed JSON and open with json.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to bz2.open()

  • json_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to json.load().

Returns:

A JSON object (list, dict, etc.)

ensure_open(*subkeys, url, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]

Ensure a file is downloaded and open it.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_open_bz2(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None)[source]

Ensure a BZ2-compressed file is downloaded and open a file inside it.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to bz2.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to bz2.open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_open_gz(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None)[source]

Ensure a gzipped file is downloaded and open a file inside it.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to gzip.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to gzip.open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_open_lzma(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rt', open_kwargs=None)[source]

Ensure a LZMA-compressed file is downloaded and open a file inside it.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to lzma.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to lzma.open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_open_sqlite(*subkeys, url, name=None, force=False, download_kwargs=None)[source]

Ensure and connect to a SQLite database.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

Yields:

An instance of sqlite3.Connection from sqlite3.connect()

Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/hp.db” >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> module = pystow.module(“test”) >>> with module.ensure_open_sqlite(url=url) as conn: >>> df = pd.read_sql(sql, conn)

ensure_open_sqlite_gz(*subkeys, url, name=None, force=False, download_kwargs=None)[source]

Ensure and connect to a SQLite database that’s gzipped.

Unfortunately, it’s a paid feature to directly read gzipped sqlite files, so this automatically gunzips it first.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

Yields:

An instance of sqlite3.Connection from sqlite3.connect()

Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/hp.db.gz” >>> module = pystow.module(“test”) >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> with module.ensure_open_sqlite_gz(url=url) as conn: >>> df = pd.read_sql(sql, conn)

ensure_open_tarfile(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]

Ensure a tar file is downloaded and open a file inside it.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to tarfile.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to tarfile.open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_open_zip(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]

Ensure a file is downloaded then open it with zipfile.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to zipfile.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to zipfile.open()

Yields:

An open file object

Return type:

Iterator[IO]

ensure_pickle(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]

Download a pickle file and open with pickle.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • pickle_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pickle.load().

Return type:

Any

Returns:

Any object

ensure_pickle_gz(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]

Download a gzipped pickle file and open with pickle.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • mode (str) – The read mode, passed to gzip.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to gzip.open()

  • pickle_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pickle.load().

Return type:

Any

Returns:

Any object

ensure_rdf(*subkeys, url, name=None, force=False, download_kwargs=None, precache=True, parse_kwargs=None)[source]

Download a RDF file and open with rdflib.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • precache (bool) – Should the parsed rdflib.Graph be stored as a pickle for fast loading?

  • parse_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.read_rdf() and transitively to rdflib.Graph.parse().

Returns:

An RDF graph

Return type:

rdflib.Graph

ensure_tar_df(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]

Download a tar file and open an inner file as a dataframe with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • read_csv_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.read_csv().

Return type:

DataFrame

Returns:

A dataframe

Warning

If you have lots of files to read in the same archive, it’s better just to unzip first.

ensure_tar_xml(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, parse_kwargs=None)[source]

Download a tar file and open an inner file as an XML with lxml.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • parse_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to lxml.etree.parse().

Returns:

An ElementTree object

Warning

If you have lots of files to read in the same archive, it’s better just to unzip first.

ensure_untar(*subkeys, url, name=None, directory=None, force=False, download_kwargs=None, extract_kwargs=None)[source]

Ensure a tar file is downloaded and unarchived.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • directory (Optional[str]) – Overrides the name of the directory into which the tar archive is extracted. If none given, will use the stem of the file name that gets downloaded.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • extract_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to tarfile.TarFile.extract_all().

Return type:

Path

Returns:

The path of the directory where the file that has been downloaded gets extracted to

ensure_xml(*subkeys, url, name=None, force=False, download_kwargs=None, parse_kwargs=None)[source]

Download an XML file and open it with lxml.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • parse_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to lxml.etree.parse().

Return type:

ElementTree

Returns:

An ElementTree object

Warning

If you have lots of files to read in the same archive, it’s better just to unzip first.

ensure_zip_df(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]

Download a zip file and open an inner file as a dataframe with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • read_csv_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.read_csv().

Returns:

A pandas DataFrame

Return type:

pandas.DataFrame

ensure_zip_np(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, load_kwargs=None)[source]

Download a zip file and open an inner file as an array-like with numpy.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url (str) – The URL to download.

  • inner_path (str) – The relative path to the file inside the archive

  • name (Optional[str]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force (bool) – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.download().

  • load_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments that are passed through to read_zip_np() and transitively to numpy.load().

Returns:

An array-like object

Return type:

numpy.typing.ArrayLike

classmethod from_key(key, *subkeys, ensure_exists=True)[source]

Get a module for the given directory or one of its subdirectories.

Parameters:
  • key (str) – The name of the module. No funny characters. The envvar <key>_HOME where key is uppercased is checked first before using the default home directory.

  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • ensure_exists (bool) – Should all directories be created automatically? Defaults to true.

Return type:

Module

Returns:

A module

join(*subkeys, name=None, ensure_exists=True)[source]

Get a subdirectory of the current module.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • ensure_exists (bool) – Should all directories be created automatically? Defaults to true.

  • name (Optional[str]) – The name of the file (optional) inside the folder

Return type:

Path

Returns:

The path of the directory or subdirectory for the given module.

joinpath_sqlite(*subkeys, name)[source]

Get an SQLite database connection string.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the database file.

Return type:

str

Returns:

A SQLite path string.

load_df(*subkeys, name, read_csv_kwargs=None)[source]

Open a pre-existing CSV as a dataframe with pandas.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • read_csv_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pandas.read_csv().

Return type:

DataFrame

Returns:

A pandas DataFrame

load_json(*subkeys, name, open_kwargs=None, json_load_kwargs=None)[source]

Open a JSON file json.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • json_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to json.load().

Return type:

Any

Returns:

A JSON object (list, dict, etc.)

load_pickle(*subkeys, name, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]

Open a pickle file with pickle.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • pickle_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pickle.load().

Return type:

Any

Returns:

Any object

load_pickle_gz(*subkeys, name, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]

Open a gzipped pickle file with pickle.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to gzip.open()

  • pickle_load_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pickle.load().

Return type:

Any

Returns:

Any object

load_rdf(*subkeys, name=None, parse_kwargs=None)[source]

Open an RDF file with rdflib.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (Optional[str]) – The name of the file to open

  • parse_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to pystow.utils.read_rdf() and transitively to rdflib.Graph.parse().

Return type:

Graph

Returns:

An RDF graph

load_xml(*subkeys, name, parse_kwargs=None)[source]

Load an XML file with lxml.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • parse_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass through to lxml.etree.parse().

Return type:

ElementTree

Returns:

An ElementTree object

Warning

If you have lots of files to read in the same archive, it’s better just to unzip first.

module(*subkeys, ensure_exists=True)[source]

Get a module for a subdirectory of the current module.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • ensure_exists (bool) – Should all directories be created automatically? Defaults to true.

Return type:

Module

Returns:

A module representing the subdirectory based on the given subkeys.

open(*subkeys, name, mode='r', open_kwargs=None, ensure_exists=False)[source]

Open a file that exists already.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • mode (str) – The read mode, passed to open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to open()

  • ensure_exists (bool) – Should the file be made? Set to true on write operations.

Yields:

An open file object

Return type:

Iterator[IO]

open_gz(*subkeys, name, mode='rt', open_kwargs=None, ensure_exists=False)[source]

Open a gzipped file that exists already.

Parameters:
  • subkeys (str) – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • name (str) – The name of the file to open

  • mode (str) – The read mode, passed to gzip.open()

  • open_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to gzip.open()

  • ensure_exists (bool) – Should the file be made? Set to true on write operations.

Yields:

An open file object

Return type:

Iterator[IO]