10. Pandas的DataFrame的访问
DataFrame是二维数据类型,每一列是Series,可以访问DataFrame的列再访问行,也可以用iloc、loc、at等属性来访问DataFrame。
10.1 []选择列
对DataFrame使用[]和对Series使用[]的结果不同,DataFrame使用了[]则是选择了一个字段所有数据即一列,而Series则是会得到某行的数据。
- DataFrame单列数据选择
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1["ax"]
print ss["a"]
程序执行结果:
dataframe ***********
   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
0    10 # print df1["ax"]
1    13
2    16
3    19
4    22
5    25
6    28
7    31
8    34
9    37
Name: ax, dtype: int64
2 # print ss["a"]
- DataFrame多列数据选择,如果[]里给出多个列的名字组成的列表,则可以选择多列和Series一样。
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1[["ax", "cx"]]
print ss[["a", "d"]]
程序执行结果:
   ax  cx # print df1[["ax", "cx"]]
0  10  12
1  13  15
2  16  18
3  19  21
4  22  24
5  25  27
6  28  30
7  31  33
8  34  36
9  37  39
a    2 # print ss[["a", "d"]]
d    4
dtype: int64
10.2 loc[]选择行
dataFrame里可以通过loc[]的方式选择label标识的行数据。
- 通过label进行单行选择。
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.loc["a"]
print ss["a"]
程序执行结果:
dataframe ***********
   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
ax    10 # print df1.loc["a"]
bx    11
cx    12
Name: 0, dtype: int64
2 # print ss["a"]
- 可以在loc[]的[]里给出列表list列出要选出的多行,从而得到所行数据。
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.loc[["a","c"]]
print ss[["a", "c"]]
程序执行结果:
   ax  bx  cx # print df1.loc[["a","c"]]
a  10  11  12
c  16  17  18
a    2 # print ss[["a", "c"]]
c    1
10.3 iloc[]选择行
与loc[]不同之处,iloc[]里是位置信息,而loc[]里是标签信息。
- iloc[]选择单行数据。
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.iloc[1]
print ss[1]
程序执行结果:
ax    13 # print df1.iloc[1]
bx    14
cx    15
Name: b, dtype: int64
3 # print ss[1]
- iloc[]选择多行数据,只需在[]给出要选择行的位置信息组成的列表即可。
 
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.iloc[[0, 1, 3]]
print ss[[0, 1, 3]]
程序执行结果:
dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
   ax  bx  cx # print df1.iloc[[0, 1, 3]]
a  10  11  12
b  13  14  15
d  19  20  21
a    2 # print ss[[0, 1, 3]]
b    3
d    4
dtype: int64
10.4 at[]选择标签指定某值
DataFrame有行和列的概念,在at[]给出行和列label信息可以选择对应行列上的数据值。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.at["b", "bx"]
程序执行结果
dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
14 # print df1.at["b", "bx"]
10.5 iat[]选择位置上的值
iat[]的[]里给出行和列的位置信息,即可选择位置上的数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.iat[1, 2]
iat[]里的1是第1行,2是第2列。
10.6 ix[]混合选择
ix[]的[]里可以是label数据和位置数据的混合使用。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.ix[[0,1,2], ["ax","cx"]]
print df1.ix[["ax","cx"], [0,1,2]]
程序结果:
dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
   ax  cx # print df1.ix[[0,1,2], ["ax","cx"]]
a  10  12
b  13  15
c  16  18
    ax  bx  cx # print df1.ix[["ax","cx"], [0,1,2]]
ax NaN NaN NaN
cx NaN NaN NaN
从print df1.ix[["ax","cx"], [0,1,2]]语句的结果可以看出在at、iat、ix等先给出的是行信息,后边是列信息。