Inout Module¶

HiDi’s pipeline module exposes functionality for performing IO tasks.

class hidi.inout.ReadTransform(infiles, **kwargs)[source]¶

Bases: hidi.transform.Transform

Read input csv data from disk.

Input data should be a csv file formatted with three columns: link_id, item_id, and score. If score is not provided, it we be defaulted to one. link_id represents to the “user” and item_id represents the “item” in the context of traditional collaborative filtering.

Parameters:	infiles (array) – Array of paths to csv documents to be loaded and concatenated into one DataFrame. Each csv document must have a `link_id` and a `item_id` column. An optional `score` column may also be supplied.

transform(**kwargs)[source]¶: Read in files from the infiles array given upon instantiation.

class hidi.inout.WriteTransform(outfile, file_format='csv', enc=None, link_key='link_id')[source]¶

Bases: hidi.transform.Transform

Write output to disk in csv or json formats.

Parameters:	outfile (str) – A string that is a path to the desired output on the file system. file_format (str) – A string that is a file extension, either `json` or `csv`.

transform(df, **kwargs)[source]¶

Write a DataFrame to a file.

Parameters:	df (pandas.DataFrame) – The Pandas DataFrame to be written to a file