HiDi: Pipelines for Embeddings¶

HiDi is a library for high-dimensional embedding generation for collaborative filtering applications.

Why HiDi?¶

We created HiDi because generating embeddings for collaborative filtering applications is a work intensive process that involves many data transformations, each of which requires special consideration to get a good result. HiDi makes the process more simple by breaking work into small steps, each of which can be executed in a pipeline.

The unit of work in HiDi is a Transformer. Transformers need only implement one function, transform.

Ok, How Do I Use It?¶

This will get you started.

from hidi import inout, clean, matrix, pipeline

# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'embeddings.csv'

transforms = [
clean.DedupeTransform(),            # Dedupe it
matrix.SparseTransform(),           # Make a sparse user*item matrix
matrix.SimilarityTransform(),       # To item*item similarity matrix
matrix.SVDTransform(),              # Perform SVD dimensionality reduction
matrix.ItemsMatrixToDFTransform(),  # Make a DataFrame with an index
inout.WriteTransform(outfile)       # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()


Setup¶

Requirements¶

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.

Installation¶

To install HiDi, simply run

\$ pip install hidi