pandas.wide_to_long
- 
pandas.wide_to_long(df, stubnames, i, j, sep='', suffix='\\d+')[source]
- 
Wide panel to long format. Less flexible but more user-friendly than melt. With stubnames [‘A’, ‘B’], this function expects to find one or more group of columns with format Asuffix1, Asuffix2,..., Bsuffix1, Bsuffix2,... You specify what you want to call this suffix in the resulting long format with j(for examplej=’year’)Each row of these wide variables are assumed to be uniquely identified by i(can be a single column name or a list of column names)All remaining variables in the data frame are left intact. Parameters: df : DataFrame The wide-format DataFrame stubnames : str or list-like The stub name(s). The wide format variables are assumed to start with the stub names. i : str or list-like Column(s) to use as id variable(s) j : str The name of the subobservation variable. What you wish to name your suffix in the long format. sep : str, default “” A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format. For example, if your column names are A-suffix1, A-suffix2, you can strip the hypen by specifying sep=’-‘New in version 0.20.0. suffix : str, default ‘\d+’ A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form Aone, Btwo,.., and you have an unrelated column Arating, you can ignore the last one by specifying suffix=’(!?one|two)’New in version 0.20.0. Returns: DataFrame A DataFrame that contains each stub name as a variable, with new index (i, j) NotesAll extra variables are left untouched. This simply uses pandas.meltunder the hood, but is hard-coded to “do the right thing” in a typicaly case.Examples>>> import pandas as pd >>> import numpy as np >>> np.random.seed(123) >>> df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"}, ... "A1980" : {0 : "d", 1 : "e", 2 : "f"}, ... "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7}, ... "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1}, ... "X" : dict(zip(range(3), np.random.randn(3))) ... }) >>> df["id"] = df.index >>> df A1970 A1980 B1970 B1980 X id 0 a d 2.5 3.2 -1.085631 0 1 b e 1.2 1.3 0.997345 1 2 c f 0.7 0.1 0.282978 2 >>> pd.wide_to_long(df, ["A", "B"], i="id", j="year") X A B id year 0 1970 -1.085631 a 2.5 1 1970 0.997345 b 1.2 2 1970 0.282978 c 0.7 0 1980 -1.085631 d 3.2 1 1980 0.997345 e 1.3 2 1980 0.282978 f 0.1With multuple id columns >>> df = pd.DataFrame({ ... 'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3], ... 'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3], ... 'ht1': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1], ... 'ht2': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9] ... }) >>> df birth famid ht1 ht2 0 1 1 2.8 3.4 1 2 1 2.9 3.8 2 3 1 2.2 2.9 3 1 2 2.0 3.2 4 2 2 1.8 2.8 5 3 2 1.9 2.4 6 1 3 2.2 3.3 7 2 3 2.3 3.4 8 3 3 2.1 2.9 >>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age') >>> l ht famid birth age 1 1 1 2.8 2 3.4 2 1 2.9 2 3.8 3 1 2.2 2 2.9 2 1 1 2.0 2 3.2 2 1 1.8 2 2.8 3 1 1.9 2 2.4 3 1 1 2.2 2 3.3 2 1 2.3 2 3.4 3 1 2.1 2 2.9Going from long back to wide just takes some creative use of unstack>>> w = l.reset_index().set_index(['famid', 'birth', 'age']).unstack() >>> w.columns = pd.Index(w.columns).str.join('') >>> w.reset_index() famid birth ht1 ht2 0 1 1 2.8 3.4 1 1 2 2.9 3.8 2 1 3 2.2 2.9 3 2 1 2.0 3.2 4 2 2 1.8 2.8 5 2 3 1.9 2.4 6 3 1 2.2 3.3 7 3 2 2.3 3.4 8 3 3 2.1 2.9Less wieldy column names are also handled >>> df = pd.DataFrame({'A(quarterly)-2010': np.random.rand(3), ... 'A(quarterly)-2011': np.random.rand(3), ... 'B(quarterly)-2010': np.random.rand(3), ... 'B(quarterly)-2011': np.random.rand(3), ... 'X' : np.random.randint(3, size=3)}) >>> df['id'] = df.index >>> df A(quarterly)-2010 A(quarterly)-2011 B(quarterly)-2010 B(quarterly)-2011 0 0.531828 0.724455 0.322959 0.293714 1 0.634401 0.611024 0.361789 0.630976 2 0.849432 0.722443 0.228263 0.092105 \ X id 0 0 0 1 1 1 2 2 2 >>> pd.wide_to_long(df, ['A(quarterly)', 'B(quarterly)'], i='id', j='year', sep='-') X A(quarterly) B(quarterly) id year 0 2010 0 0.531828 0.322959 1 2010 2 0.634401 0.361789 2 2010 2 0.849432 0.228263 0 2011 0 0.724455 0.293714 1 2011 2 0.611024 0.630976 2 2011 2 0.722443 0.092105If we have many columns, we could also use a regex to find our stubnames and pass that list on to wide_to_long >>> stubnames = set([match[0] for match in df.columns.str.findall('[A-B]\(.*\)').values if match != [] ]) >>> list(stubnames) ['B(quarterly)', 'A(quarterly)']
    © 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
    https://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.wide_to_long.html