CSV Database

This is an implementation for saving/loading candle data (timestamp, open, high, low, close, volume) in .csv files.

class src.boatwright.Data.CSVdatabase.CSVdatabase(source, dir=None, debug=False)

Read and write candles (OHLCV) data to and from .csv files.

Parameters:
  • source (str) – e.g. COINBASE, ALPACA, YAHOO, etc

  • dir (str) – directory path, defaults to location specified in config.json “DATA_DIR”

  • debug (bool) – boolean toggle for printing debugging information

Note

Candle Database Structure:

  • SOURCE
    • SYMBOL
      • YEAR
        • MONTH
          • data.csv

          • DAY
            • data.csv

            • HOUR
              • data.csv

calc_prerequisite_start(start, prerequisite_data_length, granularity, granularity_unit)

calculate the date such that the prequisite number of bars is loaded before the start date

Parameters:
  • start (datetime) – start date

  • prerequisite_data_length (int) – number of bars to be loaded before start date

  • granularity (int) – data granularity

  • granularity_unit (str) – “MINUTE”, “HOUR” or “DAY”

Returns:

datetime

get_filepath(symbol, date, level='hour')

for the given product/date, returns the filepath at the specified level

Parameters:
  • symbol (str) – e.g. “AAPL” or “BTC”

  • date (datetime) – date to generate the filepath for

  • level (str) – “year”, “month”, “day”, or “hour”

load(symbol, start, end, prerequisite_data_length=0, granularity=1, granularity_unit='MINUTE', verbose=False)

load data from the ‘database’

Parameters:
  • product_id – e.g. BTC-USD

  • start (datetime) – start date of data to collect

  • end (datetime) – end date of data to collect

  • prequisite_data_length – amount of extra data to load before start

  • interval – time between each bar of data (in interval_units)

  • granularity (int) – data granularity

  • granularity_unit (str) – “DAY”, “HOUR” or “MINUTE”

  • verbose (bool) – boolean toggle for printing progress to terminal

  • symbol (str)

  • prerequisite_data_length (int)

Returns:

pd.DataFrame with columns [timestamp, datetime, open, high, low, close, volume]

Note

granularity=5, granularity_unit=”MINUTE” yields 5 minute bars

make_date_chunks(data, granularity_unit)

takes a dataframe, and returns a list of dataframes, where each is a chunk of data with respect to the granularity_unit

Parameters:
  • data (DataFrame) – data to be chunked

  • granularity_unit (str) – “MINUTE”, “HOUR”, or “DAY”

save(symbol, data, granularity_unit, verbose=False)

save data to .csv

Parameters:
  • symbol (str) – i.e BTC or AAPL

  • data (DataFrame) – timestamp, datetime, open, high, low, close, volume data

  • granularity – either “MINUTE”, “HOUR”, “DAY”

  • verbose (bool) – boolean toggle to print progress to terminal

  • granularity_unit (str)

write(symbol, data, level)

writes data to the apporiate file. Assumes data is all part of the same chunk

Parameters:
  • symbol (str) – e.g. “AAPL” or “BTC”

  • data (DataFrame) – data, assumed to be one chunk of data to be appended to a single file

  • level (str) – file path level, either “month”, “day”, “hour”