Module
- class Module(base, ensure_exists=True)[source]
Bases:
object
The class wrapping the directory lookup implementation.
Initialize the module.
- Parameters:
Methods Summary
dump_df
(*subkeys, name, obj[, sep, index, ...])Dump a dataframe to a TSV file with
pandas
.dump_json
(*subkeys, name, obj[, ...])Dump an object to a file with
json
.dump_pickle
(*subkeys, name, obj[, mode, ...])Dump an object to a file with
pickle
.dump_rdf
(*subkeys, name, obj[, format, ...])Dump an RDF graph to a file with
rdflib
.dump_xml
(*subkeys, name, obj[, open_kwargs, ...])Dump an XML element tree to a file with
lxml
.ensure
(*subkeys, url[, name, force, ...])Ensure a file is downloaded.
ensure_csv
(*subkeys, url[, name, force, ...])Download a CSV and open as a dataframe with
pandas
.ensure_custom
(*subkeys, name[, force])Ensure a file is present, and run a custom create function otherwise.
ensure_excel
(*subkeys, url[, name, force, ...])Download an excel file and open as a dataframe with
pandas
.ensure_from_google
(*subkeys, name, file_id)Ensure a file is downloaded from Google Drive.
ensure_from_s3
(*subkeys, s3_bucket, s3_key)Ensure a file is downloaded.
ensure_gunzip
(*subkeys, url[, name, force, ...])Ensure a tar.gz file is downloaded and unarchived.
ensure_json
(*subkeys, url[, name, force, ...])Download JSON and open with
json
.ensure_json_bz2
(*subkeys, url[, name, ...])Download BZ2-compressed JSON and open with
json
.ensure_open
(*subkeys, url[, name, force, ...])Ensure a file is downloaded and open it.
ensure_open_bz2
(*subkeys, url[, name, ...])Ensure a BZ2-compressed file is downloaded and open a file inside it.
ensure_open_gz
(*subkeys, url[, name, force, ...])Ensure a gzipped file is downloaded and open a file inside it.
ensure_open_lzma
(*subkeys, url[, name, ...])Ensure a LZMA-compressed file is downloaded and open a file inside it.
ensure_open_sqlite
(*subkeys, url[, name, ...])Ensure and connect to a SQLite database.
ensure_open_sqlite_gz
(*subkeys, url[, name, ...])Ensure and connect to a SQLite database that's gzipped.
ensure_open_tarfile
(*subkeys, url, inner_path)Ensure a tar file is downloaded and open a file inside it.
ensure_open_zip
(*subkeys, url, inner_path[, ...])Ensure a file is downloaded then open it with
zipfile
.ensure_pickle
(*subkeys, url[, name, force, ...])Download a pickle file and open with
pickle
.ensure_pickle_gz
(*subkeys, url[, name, ...])Download a gzipped pickle file and open with
pickle
.ensure_rdf
(*subkeys, url[, name, force, ...])Download a RDF file and open with
rdflib
.ensure_tar_df
(*subkeys, url, inner_path[, ...])Download a tar file and open an inner file as a dataframe with
pandas
.ensure_tar_xml
(*subkeys, url, inner_path[, ...])Download a tar file and open an inner file as an XML with
lxml
.ensure_untar
(*subkeys, url[, name, ...])Ensure a tar file is downloaded and unarchived.
ensure_xml
(*subkeys, url[, name, force, ...])Download an XML file and open it with
lxml
.ensure_zip_df
(*subkeys, url, inner_path[, ...])Download a zip file and open an inner file as a dataframe with
pandas
.ensure_zip_np
(*subkeys, url, inner_path[, ...])Download a zip file and open an inner file as an array-like with
numpy
.from_key
(key, *subkeys[, ensure_exists])Get a module for the given directory or one of its subdirectories.
join
(*subkeys[, name, ensure_exists])Get a subdirectory of the current module.
joinpath_sqlite
(*subkeys, name)Get an SQLite database connection string.
load_df
(*subkeys, name[, read_csv_kwargs])Open a pre-existing CSV as a dataframe with
pandas
.load_json
(*subkeys, name[, open_kwargs, ...])Open a JSON file
json
.load_pickle
(*subkeys, name[, mode, ...])Open a pickle file with
pickle
.load_pickle_gz
(*subkeys, name[, mode, ...])Open a gzipped pickle file with
pickle
.load_rdf
(*subkeys[, name, parse_kwargs])Open an RDF file with
rdflib
.load_xml
(*subkeys, name[, parse_kwargs])Load an XML file with
lxml
.module
(*subkeys[, ensure_exists])Get a module for a subdirectory of the current module.
open
(*subkeys, name[, mode, open_kwargs, ...])Open a file that exists already.
open_gz
(*subkeys, name[, mode, open_kwargs, ...])Open a gzipped file that exists already.
Methods Documentation
- dump_df(*subkeys, name, obj, sep='\\t', index=False, to_csv_kwargs=None)[source]
Dump a dataframe to a TSV file with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.obj (
DataFrame
) – The dataframe to dumpsep (
str
) – The separator to use, defaults to a tabindex – Should the index be dumped? Defaults to false.
to_csv_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.DataFrame.to_csv()
.
- Return type:
- dump_json(*subkeys, name, obj, open_kwargs=None, json_dump_kwargs=None)[source]
Dump an object to a file with
json
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openobj (
Any
) – The object to dumpopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
json_dump_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tojson.dump()
.
- Return type:
- dump_pickle(*subkeys, name, obj, mode='wb', open_kwargs=None, pickle_dump_kwargs=None)[source]
Dump an object to a file with
pickle
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openobj (
Any
) – The object to dumpopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
pickle_dump_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topickle.dump()
.
- Return type:
- dump_rdf(*subkeys, name, obj, format='turtle', serialize_kwargs=None)[source]
Dump an RDF graph to a file with
rdflib
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openobj (
Graph
) – The object to dumpformat (
str
) – The format to dump inserialize_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to through tordflib.Graph.serialize()
.
- dump_xml(*subkeys, name, obj, open_kwargs=None, write_kwargs=None)[source]
Dump an XML element tree to a file with
lxml
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openobj (
ElementTree
) – The object to dumpopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
write_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tolxml.etree.ElementTree.write()
.
- ensure(*subkeys, url, name=None, force=False, download_kwargs=None)[source]
Ensure a file is downloaded.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.
- Return type:
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_csv(*subkeys, url, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]
Download a CSV and open as a dataframe with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.read_csv_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.read_csv()
.
- Returns:
A pandas DataFrame
- Return type:
- ensure_custom(*subkeys, name, force=False, provider, **kwargs)[source]
Ensure a file is present, and run a custom create function otherwise.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The file name.force (
bool
) – Should the file be re-created, even if the path already exists?provider (
Callable
[...
,None
]) – The file provider. Will be run with the path as the first positional argument, if the file needs to be generated.kwargs – Additional keyword-based parameters passed to the provider.
- Raises:
ValueError – If the provider was called but the file was not created by it.
- Return type:
- Returns:
The path of the file that has been created (or already exists)
- ensure_excel(*subkeys, url, name=None, force=False, download_kwargs=None, read_excel_kwargs=None)[source]
Download an excel file and open as a dataframe with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.read_excel_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.read_excel()
.
- Return type:
- Returns:
A pandas DataFrame
- ensure_from_google(*subkeys, name, file_id, force=False, download_kwargs=None)[source]
Ensure a file is downloaded from Google Drive.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the filefile_id (
str
) – The file identifier of the google file. If your share link is https://drive.google.com/file/d/1AsPPU4ka1Rc9u-XYMGWtvV65hF3egi0z/view, then your file id is1AsPPU4ka1Rc9u-XYMGWtvV65hF3egi0z
.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download_from_google()
.
- Return type:
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_from_s3(*subkeys, s3_bucket, s3_key, name=None, client=None, client_kwargs=None, download_file_kwargs=None, force=False)[source]
Ensure a file is downloaded.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.s3_bucket (
str
) – The S3 bucket namename (
Optional
[str
]) – Overrides the name of the file at the end of the S3 key, if given.client (
Optional
[BaseClient
]) – A botocore client. If none given, one will be created automaticallyclient_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to be passed to the client on instantiation.download_file_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to be passed toboto3.s3.transfer.S3Transfer.download_file()
force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.
- Return type:
- Returns:
The path of the file that has been downloaded (or already exists)
- ensure_gunzip(*subkeys, url, name=None, force=False, autoclean=True, download_kwargs=None)[source]
Ensure a tar.gz file is downloaded and unarchived.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.autoclean (
bool
) – Should the zipped file be deleted?download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.
- Return type:
- Returns:
The path of the directory where the file that has been downloaded gets extracted to
- ensure_json(*subkeys, url, name=None, force=False, download_kwargs=None, open_kwargs=None, json_load_kwargs=None)[source]
Download JSON and open with
json
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
json_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tojson.load()
.
- Return type:
- Returns:
A JSON object (list, dict, etc.)
- ensure_json_bz2(*subkeys, url, name=None, force=False, download_kwargs=None, open_kwargs=None, json_load_kwargs=None)[source]
Download BZ2-compressed JSON and open with
json
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed tobz2.open()
json_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tojson.load()
.
- Returns:
A JSON object (list, dict, etc.)
- ensure_open(*subkeys, url, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]
Ensure a file is downloaded and open it.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
- Yields:
An open file object
- Return type:
- ensure_open_bz2(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None)[source]
Ensure a BZ2-compressed file is downloaded and open a file inside it.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed tobz2.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed tobz2.open()
- Yields:
An open file object
- Return type:
- ensure_open_gz(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None)[source]
Ensure a gzipped file is downloaded and open a file inside it.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed togzip.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed togzip.open()
- Yields:
An open file object
- Return type:
- ensure_open_lzma(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rt', open_kwargs=None)[source]
Ensure a LZMA-compressed file is downloaded and open a file inside it.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed tolzma.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed tolzma.open()
- Yields:
An open file object
- Return type:
- ensure_open_sqlite(*subkeys, url, name=None, force=False, download_kwargs=None)[source]
Ensure and connect to a SQLite database.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.
- Yields:
An instance of
sqlite3.Connection
fromsqlite3.connect()
Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/hp.db” >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> module = pystow.module(“test”) >>> with module.ensure_open_sqlite(url=url) as conn: >>> df = pd.read_sql(sql, conn)
- ensure_open_sqlite_gz(*subkeys, url, name=None, force=False, download_kwargs=None)[source]
Ensure and connect to a SQLite database that’s gzipped.
Unfortunately, it’s a paid feature to directly read gzipped sqlite files, so this automatically gunzips it first.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.
- Yields:
An instance of
sqlite3.Connection
fromsqlite3.connect()
Example usage: >>> import pystow >>> import pandas as pd >>> url = “https://s3.amazonaws.com/bbop-sqlite/hp.db.gz” >>> module = pystow.module(“test”) >>> sql = “SELECT * FROM entailed_edge LIMIT 10” >>> with module.ensure_open_sqlite_gz(url=url) as conn: >>> df = pd.read_sql(sql, conn)
- ensure_open_tarfile(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]
Ensure a tar file is downloaded and open a file inside it.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed totarfile.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed totarfile.open()
- Yields:
An open file object
- Return type:
- ensure_open_zip(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, mode='r', open_kwargs=None)[source]
Ensure a file is downloaded then open it with
zipfile
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed tozipfile.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed tozipfile.open()
- Yields:
An open file object
- Return type:
- ensure_pickle(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]
Download a pickle file and open with
pickle
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
pickle_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topickle.load()
.
- Return type:
- Returns:
Any object
- ensure_pickle_gz(*subkeys, url, name=None, force=False, download_kwargs=None, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]
Download a gzipped pickle file and open with
pickle
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.mode (
str
) – The read mode, passed togzip.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed togzip.open()
pickle_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topickle.load()
.
- Return type:
- Returns:
Any object
- ensure_rdf(*subkeys, url, name=None, force=False, download_kwargs=None, precache=True, parse_kwargs=None)[source]
Download a RDF file and open with
rdflib
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.precache (
bool
) – Should the parsedrdflib.Graph
be stored as a pickle for fast loading?parse_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.read_rdf()
and transitively tordflib.Graph.parse()
.
- Returns:
An RDF graph
- Return type:
- ensure_tar_df(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]
Download a tar file and open an inner file as a dataframe with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.read_csv_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.read_csv()
.
- Return type:
- Returns:
A dataframe
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_tar_xml(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, parse_kwargs=None)[source]
Download a tar file and open an inner file as an XML with
lxml
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.parse_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tolxml.etree.parse()
.
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_untar(*subkeys, url, name=None, directory=None, force=False, download_kwargs=None, extract_kwargs=None)[source]
Ensure a tar file is downloaded and unarchived.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.directory (
Optional
[str
]) – Overrides the name of the directory into which the tar archive is extracted. If none given, will use the stem of the file name that gets downloaded.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.extract_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass totarfile.TarFile.extract_all()
.
- Return type:
- Returns:
The path of the directory where the file that has been downloaded gets extracted to
- ensure_xml(*subkeys, url, name=None, force=False, download_kwargs=None, parse_kwargs=None)[source]
Download an XML file and open it with
lxml
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.name (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.parse_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tolxml.etree.parse()
.
- Return type:
ElementTree
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- ensure_zip_df(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, read_csv_kwargs=None)[source]
Download a zip file and open an inner file as a dataframe with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.read_csv_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.read_csv()
.
- Returns:
A pandas DataFrame
- Return type:
- ensure_zip_np(*subkeys, url, inner_path, name=None, force=False, download_kwargs=None, load_kwargs=None)[source]
Download a zip file and open an inner file as an array-like with
numpy
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.url (
str
) – The URL to download.inner_path (
str
) – The relative path to the file inside the archivename (
Optional
[str
]) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.force (
bool
) – Should the download be done again, even if the path already exists? Defaults to false.download_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topystow.utils.download()
.load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments that are passed through toread_zip_np()
and transitively tonumpy.load()
.
- Returns:
An array-like object
- Return type:
numpy.typing.ArrayLike
- classmethod from_key(key, *subkeys, ensure_exists=True)[source]
Get a module for the given directory or one of its subdirectories.
- Parameters:
key (
str
) – The name of the module. No funny characters. The envvar <key>_HOME where key is uppercased is checked first before using the default home directory.subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.ensure_exists (
bool
) – Should all directories be created automatically? Defaults to true.
- Return type:
- Returns:
A module
- join(*subkeys, name=None, ensure_exists=True)[source]
Get a subdirectory of the current module.
- Parameters:
- Return type:
- Returns:
The path of the directory or subdirectory for the given module.
- load_df(*subkeys, name, read_csv_kwargs=None)[source]
Open a pre-existing CSV as a dataframe with
pandas
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.read_csv_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topandas.read_csv()
.
- Return type:
- Returns:
A pandas DataFrame
- load_json(*subkeys, name, open_kwargs=None, json_load_kwargs=None)[source]
Open a JSON file
json
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
json_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through tojson.load()
.
- Return type:
- Returns:
A JSON object (list, dict, etc.)
- load_pickle(*subkeys, name, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]
Open a pickle file with
pickle
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
pickle_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topickle.load()
.
- Return type:
- Returns:
Any object
- load_pickle_gz(*subkeys, name, mode='rb', open_kwargs=None, pickle_load_kwargs=None)[source]
Open a gzipped pickle file with
pickle
.- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed togzip.open()
pickle_load_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass through topickle.load()
.
- Return type:
- Returns:
Any object
- load_rdf(*subkeys, name=None, parse_kwargs=None)[source]
Open an RDF file with
rdflib
.- Parameters:
- Return type:
- Returns:
An RDF graph
- load_xml(*subkeys, name, parse_kwargs=None)[source]
Load an XML file with
lxml
.- Parameters:
- Return type:
ElementTree
- Returns:
An ElementTree object
Warning
If you have lots of files to read in the same archive, it’s better just to unzip first.
- module(*subkeys, ensure_exists=True)[source]
Get a module for a subdirectory of the current module.
- Parameters:
- Return type:
- Returns:
A module representing the subdirectory based on the given
subkeys
.
- open(*subkeys, name, mode='r', open_kwargs=None, ensure_exists=False)[source]
Open a file that exists already.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openopen_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed toopen()
ensure_exists (
bool
) – Should the file be made? Set to true on write operations.
- Yields:
An open file object
- Return type:
- open_gz(*subkeys, name, mode='rt', open_kwargs=None, ensure_exists=False)[source]
Open a gzipped file that exists already.
- Parameters:
subkeys (
str
) – A sequence of additional strings to join. If none are given, returns the directory for this module.name (
str
) – The name of the file to openmode (
str
) – The read mode, passed togzip.open()
open_kwargs (
Optional
[Mapping
[str
,Any
]]) – Additional keyword arguments passed togzip.open()
ensure_exists (
bool
) – Should the file be made? Set to true on write operations.
- Yields:
An open file object
- Return type: