ensure_tar_df

ensure_tar_df(key: str, *subkeys: str, url: str, inner_path: str, name: str | None = None, force: bool = False, download_kwargs: DownloadKwargs | None = None, read_csv_kwargs: Mapping[str, Any] | None = None) pd.DataFrame[source]

Download a tar file and open an inner file as a dataframe with pandas.

Parameters:
  • key – The module name

  • subkeys – A sequence of additional strings to join. If none are given, returns the directory for this module.

  • url – The URL to download.

  • inner_path – The relative path to the file inside the archive

  • name – Overrides the name of the file at the end of the URL, if given. Also useful for URLs that don’t have proper filenames with extensions.

  • force – Should the download be done again, even if the path already exists? Defaults to false.

  • download_kwargs – Keyword arguments to pass through to pystow.utils.download().

  • read_csv_kwargs – Keyword arguments to pass through to pandas.read_csv().

Returns:

A dataframe

Warning

If you have lots of files to read in the same archive, it’s better just to unzip first.