HiDi’s pipeline module exposes functionality for performing IO tasks.
Read input csv data from disk.
Input data should be a csv file formatted with three columns:
score. If score is not provided, it we be defaulted to one.
link_idrepresents to the “user” and item_id represents the “item” in the context of traditional collaborative filtering.
Parameters: infiles (array) – Array of paths to csv documents to be loaded and concatenated into one DataFrame. Each csv document must have a
item_idcolumn. An optional
scorecolumn may also be supplied.
Read in files from the
infilesarray given upon instantiation.
WriteTransform(outfile, file_format='csv', enc=None, link_key='link_id')¶
Write output to disk in csv or json formats.
- outfile (str) – A string that is a path to the desired output on the file system.
- file_format (str) – A string that is a file extension,
Write a DataFrame to a file.
Parameters: df (pandas.DataFrame) – The Pandas DataFrame to be written to a file