Inout Module

HiDi’s pipeline module exposes functionality for performing IO tasks.

class hidi.inout.ReadTransform(infiles, **kwargs)[source]

Bases: hidi.transform.Transform

Read input csv data from disk.

Input data should be a csv file formatted with three columns: link_id, item_id, and score. If score is not provided, it we be defaulted to one. link_id represents to the “user” and item_id represents the “item” in the context of traditional collaborative filtering.

Parameters:infiles (array) – Array of paths to csv documents to be loaded and concatenated into one DataFrame. Each csv document must have a link_id and a item_id column. An optional score column may also be supplied.
transform(**kwargs)[source]

Read in files from the infiles array given upon instantiation.

Return type:pandas.DataFrame
class hidi.inout.WriteTransform(outfile, file_format='csv', enc=None, link_key='link_id')[source]

Bases: hidi.transform.Transform

Write output to disk in csv or json formats.

Parameters:
  • outfile (str) – A string that is a path to the desired output on the file system.
  • file_format (str) – A string that is a file extension, either json or csv.
transform(df, **kwargs)[source]

Write a DataFrame to a file.

Parameters:df (pandas.DataFrame) – The Pandas DataFrame to be written to a file
Return type:pandas.DataFrame