pandas.core.groupby.DataFrameGroupBy.describe
- 
DataFrameGroupBy.describe(**kwargs)[source]
- 
Parameters: percentiles : list-like of numbers, optional The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.- 
include : ‘all’, list-like of dtypes or None (default), optional
- 
A white list of data types to include in the result. Ignored for Series. Here are the options:- ‘all’ : All columns of the input will be included in the output.
- A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to categorical objects submit thenumpy.objectdata type. Strings can also be used in the style ofselect_dtypes(e.g.df.describe(include=['O']))
- None (default) : The result will include all numeric columns.
 
- 
exclude : list-like of dtypes or None (default), optional,
- 
A black list of data types to omit from the result. Ignored for Series. Here are the options:- A list-like of dtypes : Excludes the provided data types from the result. To select numeric types submit numpy.number. To select categorical objects submit the data typenumpy.object. Strings can also be used in the style ofselect_dtypes(e.g.df.describe(include=['O']))
- None (default) : The result will exclude nothing.
 
- A list-like of dtypes : Excludes the provided data types from the result. To select numeric types submit 
 Returns: summary: Series/DataFrame of summary statistics NotesFor numeric data, the result’s index will include count,mean,std,min,maxas well as lower,50and upper percentiles. By default the lower percentile is25and the upper percentile is75. The50percentile is the same as the median.For object data (e.g. strings or timestamps), the result’s index will include count,unique,top, andfreq. Thetopis the most common value. Thefreqis the most common value’s frequency. Timestamps also include thefirstandlastitems.If multiple object values have the highest count, then the countandtopresults will be arbitrarily chosen from among those with the highest count.For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. Ifinclude='all'is provided as an option, the result will include a union of attributes of each type.The includeandexcludeparameters can be used to limit which columns in aDataFrameare analyzed for the output. The parameters are ignored when analyzing aSeries.ExamplesDescribing a numeric Series.>>> s = pd.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Describing a categorical Series.>>> s = pd.Series(['a', 'a', 'b', 'c']) >>> s.describe() count 4 unique 3 top a freq 2 dtype: object Describing a timestamp Series.>>> s = pd.Series([ ... np.datetime64("2000-01-01"), ... np.datetime64("2010-01-01"), ... np.datetime64("2010-01-01") ... ]) >>> s.describe() count 3 unique 2 top 2010-01-01 00:00:00 freq 2 first 2000-01-01 00:00:00 last 2010-01-01 00:00:00 dtype: objectDescribing a DataFrame. By default only numeric fields are returned.>>> df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']], ... columns=['numeric', 'object']) >>> df.describe() numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0Describing all columns of a DataFrameregardless of data type.>>> df.describe(include='all') numeric object count 3.0 3 unique NaN 3 top NaN b freq NaN 1 mean 2.0 NaN std 1.0 NaN min 1.0 NaN 25% 1.5 NaN 50% 2.0 NaN 75% 2.5 NaN max 3.0 NaNDescribing a column from a DataFrameby accessing it as an attribute.>>> df.numeric.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Name: numeric, dtype: float64 Including only numeric columns in a DataFramedescription.>>> df.describe(include=[np.number]) numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0Including only string columns in a DataFramedescription.>>> df.describe(include=[np.object]) object count 3 unique 3 top b freq 1Excluding numeric columns from a DataFramedescription.>>> df.describe(exclude=[np.number]) object count 3 unique 3 top b freq 1Excluding object columns from a DataFramedescription.>>> df.describe(exclude=[np.object]) numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0
- 
    © 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
    https://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.core.groupby.DataFrameGroupBy.describe.html