Pandas Cheat Sheet


1. Collect dataframe as dictionary

.set_index([‘a’,‘b’]).T.to_dict(‘list’)

2. Read in csv file format(transpose)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

3. count rows with same value

df[col].value_counts()

4. display columns unlimit ( equivalent to spark df, with limit=False)

pd.set_option('display.expand_frame_repr', False)

Other settings:
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

5. remove all the rows with a value occur less than n times

df[df.groupby(value).uid.transform(len) > n]
or:
df.groupby(by=value).filter(lambda x: len(x) > n)