Web scraping utilties for DWR, USACE, USGS, CNRFC, CVO and SacALERT data repositories.
Create a virtual environment with Python 3's built-in venv library.
$ python -m venv ~/.virtualenvs/myenv
Activate with
$ myenv\Scripts\activate.bat (Windows)
or $ source myenv/bin/activate (MacOS).
- pyenv
- virtualenvwrapper
$ git clone https://github.com/MBKEngineers/collect.git
Use the "editable" flag (-e) flag to make sure changes in your repo are propagated to any use of your virtualenv.
$ cd collect
$ python -m pip install -e .
If you plan to use the collect.cvo module which depends on tabula-py, you will need to install Java. Follow the instructions at: https://tabula-py.readthedocs.io/en/latest/getting_started.html
Add username and password credentials to a .env file to enable downloading data from password-protected sources.
The collect module uses Sphinx to generate documentation from doc-strings in the project. To update and access documentation files, make sure that Sphinx is installed:
$ python -m pip install -e ".[docs]"
Note, there is one other Python package on PyPi named collect. However, it is not maintained and is dated 2011, so not expecting MBK codebase to use that tool.
collect now includes a command line interface for starting a new module called collect-start. Initialize a new module from a template with
$ collect-start modulename