Clean Module

HiDi’s clean module exposes functionality for cleaning data.

class hidi.clean.DedupeTransform(skip_dedupe=False)[source]

Bases: hidi.transform.Transform

Deduplicate link-item tall skinny DataFrame

transform(df, **kwargs)[source]

Takes a df that has link_id and item_id columns, and deduplicates them so that each pair is represented at most once.