11. Pandas的DataFrame的切片

在pandas里DataFrame[label]或者DataFrame[index]选择的是列。而DataFrame[start:end]则是通过切片选择的是行。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
print df1["bx"]
print df1["a" : "e"]

程序执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
a    11
b    16
c    21
d    26
e    31
f    36
g    41
h    46
i    51
j    56
Name: bx, dtype: int64
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34

如果在[]里给出的是一个列表可以选择多列,实则是非切片。但给出两个列表却不能选择多行、多列。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
print df1[["bx", "cx", "ex"]]
#print df1[["a","e"],["bx", "cx", "ex"]]

在DataFrame的[]里用切片很难选择多行多列数据,但DataFrame的loc、iloc等可以通过切片选择多行多列数据。

11.1 loc[]行列切片

loc[]里给出label的行、列的切片,可实现块选择。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
print df1.loc["b" : "e", "bx" : "ex"]

程序执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   bx  cx  dx  ex
b  16  17  18  19
c  21  22  23  24
d  26  27  28  29
e  31  32  33  34

11.2 iloc[]行列切片

在iloc里给出位置信息的行、列切片也可以实现块选择。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
print df1.iloc[2 : 6, 2 : 4]

程序的执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   cx  dx
c  22  23
d  27  28
e  32  33
f  37  38

11.3 ix[]的行列切片

ix[]里可以给出位置或者label的切片,即混合的切片,但行在前,列在后,可以实现块选择。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "dataframe", "*" * 11
print df1
print "*" * 21, "<- dataframe"
print df1.ix[2 : 6, "bx" : "ex"]

程序的执行结果:

dataframe ***********
   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   bx  cx  dx  ex
c  21  22  23  24
d  26  27  28  29
e  31  32  33  34
f  36  37  38  39