One minute

# Python Simple One-hot Encoding

## One-hot Encoding

*One-hot encoding* transforms categorical data into a fixed-size numeric vector,
where each index maps to one of the unique categories present in the input
data. While you can use `sklearn.preprocessing.OneHotEncoder`

or
`pandas.get_dummies`

to perform this transformation for you, sometimes it’s nice
to be able to do this just with the Python standard library for quick’n’dirty
scripts. Using Python’s built-in `defaultdict`

data structure and `itertools`

package, we can make a “dictionary-like” data structure that maps any hashable
data to a unique integer. If the key has not been seen before, its value will be
the next unique identifier (starting with `0`

), otherwise its index value will be
returned.

## Code

```
>>> import itertools
>>> from collections import defaultdict
>>> onehot = defaultdict(itertools.count().__next__)
>>> onehot['a']
0
>>> onehot[('b', 'c')]
1
>>> onehot['d']
2
>>> onehot['d']
2
```

A normal `dict`

can easily be retrieved with:

```
>>> dict(onehot)
{'a': 0, ('b', 'c'): 1, 'd': 2}
```

and since there’s a one-to-one mapping, you can quickly retrieve the reverse
mapping—of indices to categories—with the following `dict`

comprehension:

```
>>> {v: k for (k, v) in onehot.items()}
{0: 'a', 1: ('b', 'c'), 2: 'd'}
```