pandas.Index.drop_duplicates

Index.drop_duplicates(keep='first') [source]

Return Index with duplicate values removed.

Parameters:

keep : {‘first’, ‘last’, False}, default ‘first’

  • ‘first’ : Drop duplicates except for the first occurrence.
  • ‘last’ : Drop duplicates except for the last occurrence.
  • False : Drop all duplicates.
Returns:
deduplicated : Index

See also

Series.drop_duplicates
equivalent method on Series
DataFrame.drop_duplicates
equivalent method on DataFrame
Index.duplicated
related method on Index, indicating duplicate Index values.

Examples

Generate an pandas.Index with duplicate values.

>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])

The keep parameter controls which duplicate values are removed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’.

>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')

The value ‘last’ keeps the last occurrence for each set of duplicated entries.

>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')

The value False discards all sets of duplicated entries.

>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')

© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Index.drop_duplicates.html