Inout Module¶
HiDi’s pipeline module exposes functionality for performing IO tasks.
-
class
hidi.inout.
ReadTransform
(infiles, **kwargs)[source]¶ Bases:
hidi.transform.Transform
Read input csv data from disk.
Input data should be a csv file formatted with three columns:
link_id
,item_id
, andscore
. If score is not provided, it we be defaulted to one.link_id
represents to the “user” and item_id represents the “item” in the context of traditional collaborative filtering.Parameters: infiles (array) – Array of paths to csv documents to be loaded and concatenated into one DataFrame. Each csv document must have a link_id
and aitem_id
column. An optionalscore
column may also be supplied.
-
class
hidi.inout.
WriteTransform
(outfile, file_format='csv', enc=None, link_key='link_id')[source]¶ Bases:
hidi.transform.Transform
Write output to disk in csv or json formats.
Parameters: - outfile (str) – A string that is a path to the desired output on the file system.
- file_format (str) – A string that is a file extension,
either
json
orcsv
.