pandas.core.groupby.GroupBy.apply

GroupBy.apply(func, *args, **kwargs)[source]

Apply function func group-wise and combine the results together.

The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform. Pandas offers a wide range of method that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply.

Parameters
func:callable

A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.

args, kwargs:tuple and dict

Optional positional and keyword arguments to pass to func.

Returns
applied:Series or DataFrame

See also

pipe

Apply function to the full GroupBy object instead of to each group.

aggregate

Apply aggregate function to the GroupBy object.

transform

Apply function column-by-column to the GroupBy object.

Series.apply

Apply a function to a Series.

DataFrame.apply

Apply a function to each row or column of a DataFrame.

Notes

In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed func, see the examples below.

Examples

>>> df = pd.DataFrame({'A': 'a a b'.split(),
...                    'B': [1,2,3],
...                    'C': [4,6,5]})
>>> g = df.groupby('A')

Notice that g has two groups, a and b. Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> g[['B', 'C']].apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed func.

>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0

Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

© 2008–2021, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/1.3.4/reference/api/pandas.core.groupby.GroupBy.apply.html