DataFrame的index也是可以修改的,同样传入一个列表:
frame = DataFrame(data,columns=['sale','fruit','year'],index=[4,3,2,1,0]) frame Out[22]: sale fruit year 4 15000 Apple 2010 3 17000 Apple 2011 2 36000 Orange 2012 1 24000 Orange 2011 0 29000 Banana 2012通过传入的[4,3,2,1,0]就将原来的index从0,1,2,3,4改变为4,3,2,1,0。
通过DataFrame对象获取Series对象:
frame['year'] Out[26]: 0 2010 1 2011 2 2012 3 2011 4 2012 Name: year, dtype: int64 frame['fruit'] Out[27]: 0 Apple 1 Apple 2 Orange 3 Orange 4 Banana Name: fruit, dtype: objectframe['fruit']和frame.fruit都可以获取列,并且返回的是Series对象。
DataFrame赋值,就是对列赋值,首先获取DataFrame对象中某列的Series对象,然后通过赋值的方式就可以修改列的值:
data = {'fruit':['Apple','Apple','Orange','Orange','Banana'], 'year':[2010,2011,2012,2011,2012], 'sale':[15000,17000,36000,24000,29000]} frame = DataFrame(data,columns=['sale','fruit','year','price']) frame Out[24]: sale fruit year price 0 15000 Apple 2010 NaN 1 17000 Apple 2011 NaN 2 36000 Orange 2012 NaN 3 24000 Orange 2011 NaN 4 29000 Banana 2012 NaN frame['price'] = 20 frame Out[26]: sale fruit year price 0 15000 Apple 2010 20 1 17000 Apple 2011 20 2 36000 Orange 2012 20 3 24000 Orange 2011 20 4 29000 Banana 2012 20 frame.price = 40 frame Out[28]: sale fruit year price 0 15000 Apple 2010 40 1 17000 Apple 2011 40 2 36000 Orange 2012 40 3 24000 Orange 2011 40 4 29000 Banana 2012 40 frame.price=np.arange(5) frame Out[30]: sale fruit year price 0 15000 Apple 2010 0 1 17000 Apple 2011 1 2 36000 Orange 2012 2 3 24000 Orange 2011 3 4 29000 Banana 2012 4通过frame['price']或者frame.price获取price列,然后通过frame['price']=20或frame.price=20就可以将price列都赋值为20。
也可以通过numpy的arange方法进行赋值。如上面的代码所示。
可以通过Series给DataFrame对象赋值:
data = {'fruit':['Apple','Apple','Orange','Orange','Banana'], 'year':[2010,2011,2012,2011,2012], 'sale':[15000,17000,36000,24000,29000]} frame = DataFrame(data,columns=['sale','fruit','year','price']) frame Out[6]: sale fruit year price 0 15000 Apple 2010 NaN 1 17000 Apple 2011 NaN 2 36000 Orange 2012 NaN 3 24000 Orange 2011 NaN 4 29000 Banana 2012 NaN priceSeries = Series([3.4,4.2,2.4],index = [1,2,4]) frame.price = priceSeries frame Out[9]: sale fruit year price 0 15000 Apple 2010 NaN 1 17000 Apple 2011 3.4 2 36000 Orange 2012 4.2 3 24000 Orange 2011 NaN 4 29000 Banana 2012 2.4这种赋值方式,DataFrame的索引会和Series的索引自动匹配,在对应的索引位置赋值,匹配不上的位置将填上缺失值NaN。
创建的Series对象如果不指定索引时的赋值结果:
priceSeries = Series([3.4,4.2,2.4]) frame.price = priceSeries frame Out[12]: sale fruit year price 0 15000 Apple 2010 3.4 1 17000 Apple 2011 4.2 2 36000 Orange 2012 2.4 3 24000 Orange 2011 NaN 4 29000 Banana 2012 NaNDataFrame还支持通过列表或者数组的方式给列赋值,但是必须保证两者的长度一致:
priceList=[3.4,2.4,4.6,3.8,7.3] frame.price=priceList frame Out[15]: sale fruit year price 0 15000 Apple 2010 3.4 1 17000 Apple 2011 2.4 2 36000 Orange 2012 4.6 3 24000 Orange 2011 3.8 4 29000 Banana 2012 7.3 priceList=[3.4,2.4,4.6,3.8,7.3] frame.price=priceList赋值的列如果不存在时,相当于创建出一个新列:
frame['total'] = 30000 frame Out[45]: sale fruit year price total 0 15000 Apple 2010 3.4 30000 1 17000 Apple 2011 2.4 30000 2 36000 Orange 2012 4.6 30000 3 24000 Orange 2011 3.8 30000 4 29000 Banana 2012 7.3 30000上面的例子通过给不存在的列赋值,新增了新列total。必须使用frame['total']的方式赋值,不建议使用frame.total,使用frame.的方式给不存在的列赋值时,这个列会隐藏起来,直接输出DataFrame对象是不会看到这个total这个列的,但是它又真实的存在,下面的代码是分别使用frame['total']和frame.total给frame对象的total列赋值,total列开始是不存在的:
frame Out[60]: sale fruit year price 0 15000 Apple 2010 3.4 1 17000 Apple 2011 2.4 2 36000 Orange 2012 4.6 3 24000 Orange 2011 3.8 4 29000 Banana 2012 7.3 frame.total = 20 frame Out[62]: sale fruit year price 0 15000 Apple 2010 3.4 1 17000 Apple 2011 2.4 2 36000 Orange 2012 4.6 3 24000 Orange 2011 3.8 4 29000 Banana 2012 7.3 frame['total'] = 20 frame Out[64]: sale fruit year price total 0 15000 Apple 2010 3.4 20 1 17000 Apple 2011 2.4 20 2 36000 Orange 2012 4.6 20 3 24000 Orange 2011 3.8 20 4 29000 Banana 2012 7.3 20使用frame.total方式赋值时,是看不到total这一列的,而用frame['total']方式赋值时,则可以看到total这一列。