10. Pandas的DataFrame的访问

DataFrame是二维数据类型,每一列是Series,可以访问DataFrame的列再访问行,也可以用iloc、loc、at等属性来访问DataFrame。

10.1 []选择列

对DataFrame使用[]和对Series使用[]的结果不同,DataFrame使用了[]则是选择了一个字段所有数据即一列,而Series则是会得到某行的数据。

  • DataFrame单列数据选择
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1["ax"]
print ss["a"]

程序执行结果:

dataframe ***********
   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
0    10 # print df1["ax"]
1    13
2    16
3    19
4    22
5    25
6    28
7    31
8    34
9    37
Name: ax, dtype: int64
2 # print ss["a"]
  • DataFrame多列数据选择,如果[]里给出多个列的名字组成的列表,则可以选择多列和Series一样。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1[["ax", "cx"]]
print ss[["a", "d"]]

程序执行结果:

   ax  cx # print df1[["ax", "cx"]]
0  10  12
1  13  15
2  16  18
3  19  21
4  22  24
5  25  27
6  28  30
7  31  33
8  34  36
9  37  39
a    2 # print ss[["a", "d"]]
d    4
dtype: int64

10.2 loc[]选择行

dataFrame里可以通过loc[]的方式选择label标识的行数据。

  • 通过label进行单行选择。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.loc["a"]
print ss["a"]

程序执行结果:

dataframe ***********
   ax  bx  cx
0  10  11  12
1  13  14  15
2  16  17  18
3  19  20  21
4  22  23  24
5  25  26  27
6  28  29  30
7  31  32  33
8  34  35  36
9  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
ax    10 # print df1.loc["a"]
bx    11
cx    12
Name: 0, dtype: int64
2 # print ss["a"]
  • 可以在loc[]的[]里给出列表list列出要选出的多行,从而得到所行数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.loc[["a","c"]]
print ss[["a", "c"]]

程序执行结果:

   ax  bx  cx # print df1.loc[["a","c"]]
a  10  11  12
c  16  17  18
a    2 # print ss[["a", "c"]]
c    1

10.3 iloc[]选择行

与loc[]不同之处,iloc[]里是位置信息,而loc[]里是标签信息。

  • iloc[]选择单行数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.iloc[1]
print ss[1]

程序执行结果:

ax    13 # print df1.iloc[1]
bx    14
cx    15
Name: b, dtype: int64
3 # print ss[1]
  • iloc[]选择多行数据,只需在[]给出要选择行的位置信息组成的列表即可。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.iloc[[0, 1, 3]]
print ss[[0, 1, 3]]

程序执行结果:

dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
series **************
a    2
b    3
c    1
d    4
dtype: int64
   ax  bx  cx # print df1.iloc[[0, 1, 3]]
a  10  11  12
b  13  14  15
d  19  20  21
a    2 # print ss[[0, 1, 3]]
b    3
d    4
dtype: int64

10.4 at[]选择标签指定某值

DataFrame有行和列的概念,在at[]给出行和列label信息可以选择对应行列上的数据值。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.at["b", "bx"]

程序执行结果

dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
14 # print df1.at["b", "bx"]

10.5 iat[]选择位置上的值

iat[]的[]里给出行和列的位置信息,即可选择位置上的数据。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.iat[1, 2]

iat[]里的1是第1行,2是第2列。

10.6 ix[]混合选择

ix[]的[]里可以是label数据和位置数据的混合使用。

import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.ix[[0,1,2], ["ax","cx"]]
print df1.ix[["ax","cx"], [0,1,2]]

程序结果:

dataframe ***********
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
   ax  cx # print df1.ix[[0,1,2], ["ax","cx"]]
a  10  12
b  13  15
c  16  18
    ax  bx  cx # print df1.ix[["ax","cx"], [0,1,2]]
ax NaN NaN NaN
cx NaN NaN NaN

print df1.ix[["ax","cx"], [0,1,2]]语句的结果可以看出在at、iat、ix等先给出的是行信息,后边是列信息。