Module
- class Module(base: str | Path, ensure_exists: bool = True)[source]
Bases:
objectThe class wrapping the directory lookup implementation.
Initialize the module.
- Parameters:
base – The base directory for the module
ensure_exists – Should the base directory be created automatically? Defaults to true.
Methods Summary
dump_df(*subkeys, name, obj[, sep, index, ...])Dump a dataframe to a TSV file with
pandas.dump_json(*subkeys, name, obj[, ...])Dump an object to a file with
json.dump_pickle(*subkeys, name, obj[, mode, ...])Dump an object to a file with
pickle.dump_rdf(*subkeys, name, obj[, format, ...])Dump an RDF graph to a file with
rdflib.dump_xml(*subkeys, name, obj[, open_kwargs, ...])Dump an XML element tree to a file with
lxml.ensure(*subkeys, url[, name, version, ...])Ensure a file is downloaded.
ensure_csv(*subkeys, url[, name, force, ...])Download a CSV and open as a dataframe with
pandas.ensure_custom(*subkeys, name[, force])Ensure a file is present, and run a custom create function otherwise.
ensure_excel(*subkeys, url[, name, force, ...])Download an excel file and open as a dataframe with
pandas.ensure_from_google(*subkeys, name, file_id)Ensure a file is downloaded from Google Drive.
ensure_from_s3(*subkeys, s3_bucket, s3_key)Ensure a file is downloaded.
ensure_gunzip(*subkeys, url[, name, force, ...])Ensure a tar.gz file is downloaded and unarchived.
ensure_json(*subkeys, url[, name, force, ...])Download JSON and open with
json.ensure_json_bz2(*subkeys, url[, name, ...])Download BZ2-compressed JSON and open with
json.ensure_open(-> Generator[StringIO, None, None])Ensure a file is downloaded and open it.
ensure_open_bz2(*subkeys, url[, name, ...])Ensure a BZ2-compressed file is downloaded and open a file inside it.
ensure_open_gz(-> Generator[StringIO, None, ...)Ensure a gzipped file is downloaded and open a file inside it.
ensure_open_lzma(...)Ensure a LZMA-compressed file is downloaded and open a file inside it.
ensure_open_sqlite(*subkeys, url[, name, ...])Ensure and connect to a SQLite database.
ensure_open_sqlite_gz(*subkeys, url[, name, ...])Ensure and connect to a SQLite database that's gzipped.
ensure_open_tarfile(...)Ensure a tar file is downloaded and open a file inside it.
ensure_open_zip(...)Ensure a file is downloaded then open it with
zipfile.ensure_pickle(*subkeys, url[, name, force, ...])Download a pickle file and open with
pickle.ensure_pickle_gz(*subkeys, url[, name, ...])Download a gzipped pickle file and open with
pickle.ensure_rdf(*subkeys, url[, name, force, ...])Download a RDF file and open with
rdflib.ensure_soup(*subkeys, url[, name, version, ...])Ensure a webpage is downloaded and parsed with
BeautifulSoup.ensure_tar_df(*subkeys, url, inner_path[, ...])Download a tar file and open an inner file as a dataframe with
pandas.ensure_tar_xml(*subkeys, url, inner_path[, ...])Download a tar file and open an inner file as an XML with
lxml.ensure_untar(*subkeys, url[, name, ...])Ensure a tar file is downloaded and unarchived.
ensure_xml(*subkeys, url[, name, force, ...])Download an XML file and open it with
lxml.ensure_yaml(*subkeys, url[, name, force, ...])Download YAML and open with
yaml.ensure_zip_df(*subkeys, url, inner_path[, ...])Download a zip file and open an inner file as a dataframe with
pandas.ensure_zip_np(*subkeys, url, inner_path[, ...])Download a zip file and open an inner file as an array-like with
numpy.from_key(key, *subkeys[, ensure_exists])Get a module for the given directory or one of its subdirectories.
join(-> tuple[~pathlib.Path, ...)Get a subdirectory of the current module.
joinpath_sqlite(*subkeys, name)Get an SQLite database connection string.
load_df(*subkeys, name[, read_csv_kwargs])Open a pre-existing CSV as a dataframe with
pandas.load_json(*subkeys, name[, open_kwargs, ...])Open a JSON file
json.load_pickle(*subkeys, name[, mode, ...])Open a pickle file with
pickle.load_pickle_gz(*subkeys, name[, mode, ...])Open a gzipped pickle file with
pickle.load_rdf(*subkeys[, name, parse_kwargs])Open an RDF file with
rdflib.load_xml(*subkeys, name[, parse_kwargs])Load an XML file with
lxml.load_yaml(*subkeys, name[, open_kwargs, ...])Open a JSON file
json.module(*subkeys[, ensure_exists])Get a module for a subdirectory of the current module.
open(-> Generator[StringIO, None, None])Open a file.
open_gz(-> Generator[StringIO, None, None])Open a gzipped file that exists already.
Methods Documentation
- dump_df(*subkeys: str, name: str, obj: pd.DataFrame, sep: str = '\t', index: bool = False, to_csv_kwargs: Mapping[str, Any] | None = None) None[source]
Dump a dataframe to a TSV file with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
obj – The dataframe to dump
sep – The separator to use, defaults to a tab
index – Should the index be dumped? Defaults to false.
to_csv_kwargs – Keyword arguments to pass through to
pandas.DataFrame.to_csv().
- dump_json(*subkeys: str, name: str, obj: Any, open_kwargs: Mapping[str, Any] | None = None, json_dump_kwargs: Mapping[str, Any] | None = None) None[source]
Dump an object to a file with
json.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
obj – The object to dump
open_kwargs – Additional keyword arguments passed to
open()json_dump_kwargs – Keyword arguments to pass through to
json.dump().
- dump_pickle(*subkeys: str, name: str, obj: Any, mode: Literal['wb'] = 'wb', open_kwargs: Mapping[str, Any] | None = None, pickle_dump_kwargs: Mapping[str, Any] | None = None) None[source]
Dump an object to a file with
pickle.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
obj – The object to dump
mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()pickle_dump_kwargs – Keyword arguments to pass through to
pickle.dump().
- dump_rdf(*subkeys: str, name: str, obj: rdflib.Graph, format: str = 'turtle', serialize_kwargs: Mapping[str, Any] | None = None) None[source]
Dump an RDF graph to a file with
rdflib.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
obj – The object to dump
format – The format to dump in
serialize_kwargs – Keyword arguments to through to
rdflib.Graph.serialize().
- dump_xml(*subkeys: str, name: str, obj: lxml.etree.ElementTree, open_kwargs: Mapping[str, Any] | None = None, write_kwargs: Mapping[str, Any] | None = None) None[source]
Dump an XML element tree to a file with
lxml.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
obj – The object to dump
open_kwargs – Additional keyword arguments passed to
open()write_kwargs – Keyword arguments to pass through to
lxml.etree.ElementTree.write().
- ensure(*subkeys: str, url: str, name: str | None = None, version: None | str | Callable[[], str | None] = None, force: bool = False, download_kwargs: DownloadKwargs | None = None) Path[source]
Ensure a file is downloaded.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
version –
The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
The following example describes how to store the versioned data from the Rhea database for biologically relevant chemical reactions.
import pystow import requests def get_rhea_version() -> str: res = requests.get("https://ftp.expasy.org/databases/rhea/rhea-release.properties") _, _, version = res.text.splitlines()[0].partition("=") return version module = pystow.module("rhea") path = module.ensure( url="ftp://ftp.expasy.org/databases/rhea/rdf/rhea.rdf.gz", version=get_rhea_version, )
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_csv(*subkeys: str, url: str, name: str | None = None, force: bool = False, version: VersionHint = None, download_kwargs: DownloadKwargs | None = None, read_csv_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]
Download a CSV and open as a dataframe with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().read_csv_kwargs – Keyword arguments to pass through to
pandas.read_csv().
- Returns:
A pandas DataFrame
- ensure_custom(*subkeys: str, name: str, force: bool = False, provider: Callable[[...], None], **kwargs: Any) Path[source]
Ensure a file is present, and run a custom create function otherwise.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The file name.
force – Should the file be re-created, even if the path already exists?
provider – The file provider. Will be run with the path as the first positional argument, if the file needs to be generated.
kwargs – Additional keyword-based parameters passed to the provider.
- Returns:
The path of the file that has been created (or already exists)
- Raises:
ValueError – If the provider was called but the file was not created by it.
- ensure_excel(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, read_excel_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]
Download an excel file and open as a dataframe with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().read_excel_kwargs – Keyword arguments to pass through to
pandas.read_excel().
- Returns:
A pandas DataFrame
- ensure_from_google(*subkeys: str, name: str, file_id: str, force: bool = False, download_kwargs: Mapping[str, Any] | None = None) Path[source]
Ensure a file is downloaded from Google Drive.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file
file_id – The file identifier of the Google file. If your share link is https://drive.google.com/file/d/1AsPPU4ka1Rc9u-XYMGWtvV65hF3egi0z/view, then your file ID is
1AsPPU4ka1Rc9u-XYMGWtvV65hF3egi0z.force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download_from_google().
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_from_s3(*subkeys: str, s3_bucket: str, s3_key: str | Sequence[str], name: str | None = None, client: botocore.client.BaseClient | None = None, client_kwargs: Mapping[str, Any] | None = None, download_file_kwargs: Mapping[str, Any] | None = None, force: bool = False) Path[source]
Ensure a file is downloaded.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
s3_bucket – The S3 bucket name
s3_key – The S3 key name
name – Overrides the name of the file at the end of the S3 key, if given.
client – A botocore client. If none given, one will be created automatically
client_kwargs – Keyword arguments to be passed to the client on instantiation.
download_file_kwargs – Keyword arguments to be passed to
boto3.s3.transfer.S3Transfer.download_file()force – Should the download be done again, even if the path already exists? Defaults to false.
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_gunzip(*subkeys: str, url: str, name: str | None = None, force: bool = False, autoclean: bool = True, download_kwargs: DownloadKwargs | None = None) Path[source]
Ensure a tar.gz file is downloaded and unarchived.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
autoclean – Should the zipped file be deleted?
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().
- Returns:
The path of the directory where the file that has been downloaded gets extracted to
- ensure_json(*subkeys: str, url: str, name: str | None = None, force: bool = False, version: None | str | Callable[[], str | None] = None, download_kwargs: DownloadKwargs | None = None, open_kwargs: Mapping[str, Any] | None = None, json_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Download JSON and open with
json.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().open_kwargs – Additional keyword arguments passed to
open()json_load_kwargs – Keyword arguments to pass through to
json.load().
- Returns:
A JSON object (list, dict, etc.)
- ensure_json_bz2(*subkeys: str, url: str, name: str | None = None, force: bool = False, version: None | str | Callable[[], str | None] = None, download_kwargs: DownloadKwargs | None = None, open_kwargs: Mapping[str, Any] | None = None, json_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Download BZ2-compressed JSON and open with
json.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().open_kwargs – Additional keyword arguments passed to
bz2.open()json_load_kwargs – Keyword arguments to pass through to
json.load().
- Returns:
A JSON object (list, dict, etc.)
- ensure_open(*subkeys: str, url: str, name: str | None, version: VersionHint = None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['r', 'rt', 'w', 'wt'] = 'r', open_kwargs: Mapping[str, Any] | None) Generator[StringIO, None, None][source]
- ensure_open(*subkeys: str, url: str, name: str | None, version: VersionHint = None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['rb', 'wb'] = 'r', open_kwargs: Mapping[str, Any] | None) Generator[BytesIO, None, None]
Ensure a file is downloaded and open it.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()
- Yields:
An open file object
- ensure_open_bz2(*subkeys: str, url: str, name: str | None = None, force: bool = False, version: None | str | Callable[[], str | None] = None, download_kwargs: DownloadKwargs | None = None, mode: Literal['rb'] = 'rb', open_kwargs: Mapping[str, Any] | None = None) Generator[BZ2File, None, None][source]
Ensure a BZ2-compressed file is downloaded and open a file inside it.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
bz2.open()open_kwargs – Additional keyword arguments passed to
bz2.open()
- Yields:
An open file object
- ensure_open_gz(*subkeys: str, url: str, name: str | None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['r', 'w', 'rt', 'wt'] = 'rb', open_kwargs: Mapping[str, Any] | None) Generator[StringIO, None, None][source]
- ensure_open_gz(*subkeys: str, url: str, name: str | None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['rb', 'wb'] = 'rb', open_kwargs: Mapping[str, Any] | None) Generator[BytesIO, None, None]
Ensure a gzipped file is downloaded and open a file inside it.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
gzip.open()open_kwargs – Additional keyword arguments passed to
gzip.open()
- Yields:
An open file object
- ensure_open_lzma(*subkeys: str, url: str, name: str | None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['r', 'w', 'rt', 'wt'] = 'rt', open_kwargs: Mapping[str, Any] | None) Generator[io.TextIOWrapper[lzma.LZMAFile], None, None][source]
- ensure_open_lzma(*subkeys: str, url: str, name: str | None, force: bool, download_kwargs: DownloadKwargs | None, mode: Literal['rb', 'wb'] = 'rt', open_kwargs: Mapping[str, Any] | None) Generator[lzma.LZMAFile, None, None]
Ensure a LZMA-compressed file is downloaded and open a file inside it.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
lzma.open()open_kwargs – Additional keyword arguments passed to
lzma.open()
- Yields:
An open file object
- ensure_open_sqlite(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None) Generator[Connection, None, None][source]
Ensure and connect to a SQLite database.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().
- Yields:
An instance of
sqlite3.Connectionfromsqlite3.connect()
Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/obi.db” >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> module = pystow.module(“test”) >>> with module.ensure_open_sqlite(url=url) as conn: >>> df = pd.read_sql(sql, conn)
- ensure_open_sqlite_gz(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None) Generator[Connection, None, None][source]
Ensure and connect to a SQLite database that’s gzipped.
Unfortunately, it’s a paid feature to directly read gzipped sqlite files, so this automatically gunzips it first.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().
- Yields:
An instance of
sqlite3.Connectionfromsqlite3.connect()
Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/hp.db.gz” >>> module = pystow.module(“test”) >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> with module.ensure_open_sqlite_gz(url=url) as conn: >>> df = pd.read_sql(sql, conn)
- ensure_open_tarfile(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['rt'] = 'r', open_kwargs: Mapping[str, Any] | None = None) Generator[TextIO, None, None][source]
- ensure_open_tarfile(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['r', 'rb'] = 'r', open_kwargs: Mapping[str, Any] | None = None) Generator[IO[bytes], None, None]
Ensure a tar file is downloaded and open a file inside it.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
tarfile.open()open_kwargs – Additional keyword arguments passed to
tarfile.open()
- Yields:
An open file object
- ensure_open_zip(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['r', 'rb'] = 'r', open_kwargs: Mapping[str, Any] | None = None) Generator[BinaryIO, None, None][source]
- ensure_open_zip(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['rt'] = 'r', open_kwargs: Mapping[str, Any] | None = None) Generator[TextIO, None, None]
Ensure a file is downloaded then open it with
zipfile.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
zipfile.open(). Defaults to bytes mode forrandw.zipfile_kwargs – Additional keyword arguments passed to
zipfile.ZipFileopen_kwargs – Additional keyword arguments passed to
zipfile.open()
- Yields:
An open file object
- ensure_pickle(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['rb'] = 'rb', open_kwargs: Mapping[str, Any] | None = None, pickle_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Download a pickle file and open with
pickle.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()pickle_load_kwargs – Keyword arguments to pass through to
pickle.load().
- Returns:
Any object
- ensure_pickle_gz(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['rb'] = 'rb', open_kwargs: Mapping[str, Any] | None = None, pickle_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Download a gzipped pickle file and open with
pickle.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
gzip.open()open_kwargs – Additional keyword arguments passed to
gzip.open()pickle_load_kwargs – Keyword arguments to pass through to
pickle.load().
- Returns:
Any object
- ensure_rdf(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, precache: bool = True, parse_kwargs: Mapping[str, Any] | None = None) rdflib.Graph[source]
Download a RDF file and open with
rdflib.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().precache – Should the parsed
rdflib.Graphbe stored as a pickle for fast loading?parse_kwargs – Keyword arguments to pass through to
pystow.utils.read_rdf()and transitively tordflib.Graph.parse().
- Returns:
An RDF graph
- ensure_soup(*subkeys: str, url: str, name: str | None = None, version: VersionHint = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, mode: Literal['r', 'rt', 'w', 'wt'] | Literal['rb', 'wb'] = 'r', open_kwargs: Mapping[str, Any] | None = None, beautiful_soup_kwargs: Mapping[str, Any] | None = None) bs4.BeautifulSoup[source]
Ensure a webpage is downloaded and parsed with
BeautifulSoup.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
version – The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()beautiful_soup_kwargs – Additional keyword arguments passed to
BeautifulSoup
- Returns:
An BeautifulSoup object
Note
If you don’t need to cache, consider using
pystow.utils.get_soup()instead.
- ensure_tar_df(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, read_csv_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]
Download a tar file and open an inner file as a dataframe with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().read_csv_kwargs – Keyword arguments to pass through to
pandas.read_csv().
- Returns:
A dataframe
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_tar_xml(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, parse_kwargs: Mapping[str, Any] | None = None) lxml.etree.ElementTree[source]
Download a tar file and open an inner file as an XML with
lxml.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().parse_kwargs – Keyword arguments to pass through to
lxml.etree.parse().
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_untar(*subkeys: str, url: str, name: str | None = None, directory: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, extract_kwargs: Mapping[str, Any] | None = None) Path[source]
Ensure a tar file is downloaded and unarchived.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
directory – Overrides the name of the directory into which the tar archive is extracted. If none given, will use the stem of the file name that gets downloaded.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().extract_kwargs – Keyword arguments to pass to
tarfile.TarFile.extract_all().
- Returns:
The path of the directory where the file that has been downloaded gets extracted to
- ensure_xml(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, parse_kwargs: Mapping[str, Any] | None = None) lxml.etree.ElementTree[source]
Download an XML file and open it with
lxml.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().parse_kwargs – Keyword arguments to pass through to
lxml.etree.parse().
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_yaml(*subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, open_kwargs: Mapping[str, Any] | None = None, yaml_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Download YAML and open with
yaml.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().open_kwargs – Additional keyword arguments passed to
open()yaml_load_kwargs – Keyword arguments to pass through to
yaml.safe_load().
- Returns:
A JSON object (list, dict, etc.)
- ensure_zip_df(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, read_csv_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]
Download a zip file and open an inner file as a dataframe with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().read_csv_kwargs – Keyword arguments to pass through to
pandas.read_csv().
- Returns:
A pandas DataFrame
- ensure_zip_np(*subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, load_kwargs: Mapping[str, Any] | None = None) numpy.typing.ArrayLike[source]
Download a zip file and open an inner file as an array-like with
numpy.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
url – The URL to download.
inner_path – The relative path to the file inside the archive
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
force – Should the download be done again, even if the path already exists? Defaults to false.
download_kwargs – Keyword arguments to pass through to
pystow.utils.download().load_kwargs – Additional keyword arguments that are passed through to
read_zip_np()and transitively tonumpy.load().
- Returns:
An array-like object
- classmethod from_key(key: str, *subkeys: str, ensure_exists: bool = True) Module[source]
Get a module for the given directory or one of its subdirectories.
- Parameters:
key – The name of the module. No funny characters. The envvar <key>_HOME where key is uppercased is checked first before using the default home directory.
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
ensure_exists – Should all directories be created automatically? Defaults to true.
- Returns:
A module
- join(*subkeys: str, name: str | None = None, ensure_exists: bool = True, version: None | str | Callable[[], str | None] = None, return_version: Literal[True]) tuple[Path, str | None][source]
- join(*subkeys: str, name: str | None = None, ensure_exists: bool = True, version: None | str | Callable[[], str | None] = None, return_version: Literal[False]) Path
- join(*subkeys: str, name: str | None = None, ensure_exists: bool = True, version: None | str | Callable[[], str | None] = None) Path
Get a subdirectory of the current module.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
ensure_exists – Should all directories be created automatically? Defaults to true.
name – The name of the file (optional) inside the folder
version –
The optional version, or no-argument callable that returns an optional version. This is prepended before the subkeys.
The following example describes how to store the versioned data from the Rhea database for biologically relevant chemical reactions.
import pystow import requests def get_rhea_version() -> str: res = requests.get("https://ftp.expasy.org/databases/rhea/rhea-release.properties") _, _, version = res.text.splitlines()[0].partition("=") return version # Assume you want to download the data from # ftp://ftp.expasy.org/databases/rhea/rdf/rhea.rdf.gz, make a path # with the same name module = pystow.module("rhea") path = module.join(name="rhea.rdf.gz", version=get_rhea_version)
return_version – If true, returns the processed version
- Returns:
The path of the directory or subdirectory for the given module.
- joinpath_sqlite(*subkeys: str, name: str) str[source]
Get an SQLite database connection string.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the database file.
- Returns:
A SQLite path string.
- load_df(*subkeys: str, name: str, read_csv_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]
Open a pre-existing CSV as a dataframe with
pandas.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.
read_csv_kwargs – Keyword arguments to pass through to
pandas.read_csv().
- Returns:
A pandas DataFrame
- load_json(*subkeys: str, name: str, open_kwargs: Mapping[str, Any] | None = None, json_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Open a JSON file
json.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
open_kwargs – Additional keyword arguments passed to
open()json_load_kwargs – Keyword arguments to pass through to
json.load().
- Returns:
A JSON object (list, dict, etc.)
- load_pickle(*subkeys: str, name: str, mode: Literal['rb'] = 'rb', open_kwargs: Mapping[str, Any] | None = None, pickle_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Open a pickle file with
pickle.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()pickle_load_kwargs – Keyword arguments to pass through to
pickle.load().
- Returns:
Any object
- load_pickle_gz(*subkeys: str, name: str, mode: Literal['rb'] = 'rb', open_kwargs: Mapping[str, Any] | None = None, pickle_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Open a gzipped pickle file with
pickle.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
gzip.open()pickle_load_kwargs – Keyword arguments to pass through to
pickle.load().
- Returns:
Any object
- load_rdf(*subkeys: str, name: str | None = None, parse_kwargs: Mapping[str, Any] | None = None) rdflib.Graph[source]
Open an RDF file with
rdflib.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
parse_kwargs – Keyword arguments to pass through to
pystow.utils.read_rdf()and transitively tordflib.Graph.parse().
- Returns:
An RDF graph
- load_xml(*subkeys: str, name: str, parse_kwargs: Mapping[str, Any] | None = None) lxml.etree.ElementTree[source]
Load an XML file with
lxml.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
parse_kwargs – Keyword arguments to pass through to
lxml.etree.parse().
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- load_yaml(*subkeys: str, name: str, open_kwargs: Mapping[str, Any] | None = None, yaml_load_kwargs: Mapping[str, Any] | None = None) Any[source]
Open a JSON file
json.- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
open_kwargs – Additional keyword arguments passed to
open()yaml_load_kwargs – Keyword arguments to pass through to
yaml.safe_load().
- Returns:
A JSON object (list, dict, etc.)
- module(*subkeys: str, ensure_exists: bool = True) Module[source]
Get a module for a subdirectory of the current module.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
ensure_exists – Should all directories be created automatically? Defaults to true.
- Returns:
A module representing the subdirectory based on the given
subkeys.
- open(*subkeys: str, name: str, mode: Literal['r', 'rt', 'w', 'wt'] = 'r', open_kwargs: Mapping[str, Any] | None = None, ensure_exists: bool) Generator[StringIO, None, None][source]
- open(*subkeys: str, name: str, mode: Literal['rb', 'wb'] = 'r', open_kwargs: Mapping[str, Any] | None = None, ensure_exists: bool) Generator[BytesIO, None, None]
Open a file.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
mode – The read mode, passed to
open()open_kwargs – Additional keyword arguments passed to
open()ensure_exists – Should the directory the file is in be made? Set to true on write operations.
- Yields:
An open file object.
This function should be called inside a context manager like in the following
import pystow with pystow.module("test").open(name="test.tsv", mode="w") as file: print("Test text!", file=file)
- open_gz(*subkeys: str, name: str, mode: Literal['r', 'w', 'rt', 'wt'] = 'rb', open_kwargs: Mapping[str, Any] | None, ensure_exists: bool) Generator[StringIO, None, None][source]
- open_gz(*subkeys: str, name: str, mode: Literal['rb', 'wb'] = 'rb', open_kwargs: Mapping[str, Any] | None, ensure_exists: bool) Generator[BytesIO, None, None]
Open a gzipped file that exists already.
- Parameters:
subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.
name – The name of the file to open
mode – The read mode, passed to
gzip.open()open_kwargs – Additional keyword arguments passed to
gzip.open()ensure_exists – Should the file be made? Set to true on write operations.
- Yields:
An open file object