Inout Module¶
HiDi’s pipeline module exposes functionality for performing IO tasks.
-
class
hidi.inout.ReadTransform(infiles, **kwargs)[source]¶ Bases:
hidi.transform.TransformRead input csv data from disk.
Input data should be a csv file formatted with three columns:
link_id,item_id, andscore. If score is not provided, it we be defaulted to one.link_idrepresents to the “user” and item_id represents the “item” in the context of traditional collaborative filtering.Parameters: infiles (array) – Array of paths to csv documents to be loaded and concatenated into one DataFrame. Each csv document must have a link_idand aitem_idcolumn. An optionalscorecolumn may also be supplied.
-
class
hidi.inout.WriteTransform(outfile, file_format='csv', enc=None, link_key='link_id')[source]¶ Bases:
hidi.transform.TransformWrite output to disk in csv or json formats.
Parameters: - outfile (str) – A string that is a path to the desired output on the file system.
- file_format (str) – A string that is a file extension,
either
jsonorcsv.