翻译：《实用的Python编程》03_02_More_functions (2)

日期：2021-05-04 栏目：程序人生浏览：次

report.py 的中心部分主要用于读取 CSV 文件。例如，read_portfolio() 函数读取包含投资组合数据的文件，read_prices() 函数读取包含价格数据的文件。在这两个函数中，有很多底层的“精细的”事以及相似的特性。例如，它们都打开一个文件并使用 csv 模块来处理，并且将各种字段转换为新的类型。

如果真的需要对大量的文件进行解析，可能需要清理其中的一些内容使其更通用。这是我们的目标。

通过打开 Work/fileparse.py 文件开始本练习，该文件是我们将要写代码的地方。

练习 3.3：读取 CSV 文件

首先，让我们仅关注将 CSV 文件读入字典列表的问题。在 fileparse.py 中，定义一个如下所示的函数：

# fileparse.py import csv def parse_csv(filename): ''' Parse a CSV file into a list of records ''' with open(filename) as f: rows = csv.reader(f) # Read the file headers headers = next(rows) records = [] for row in rows: if not row: # Skip rows with no data continue record = dict(zip(headers, row)) records.append(record) return records

该函数将 CSV 文件读入字典列表中，但是隐藏了打开文件，使用 csv 模块处理，忽略空行等详细信息。

试试看：

提示： python3 -i fileparse.py.

>>> portfolio = parse_csv('Data/portfolio.csv') >>> portfolio [{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] >>>

这很好，除了不能使用数据做任何有用的计算之外。因为所有的内容都是用字符串表示。我们将马上解决此问题，先让我们继续在此基础上进行构建。

练习 3.4：构建列选择器

在大部分情况下，你只对 CSV 文件中选定的列感兴趣，而不是所有数据。修改 parse_csv() 函数，以便让用户指定任意的列，如下所示：

>>> # Read all of the data >>> portfolio = parse_csv('Data/portfolio.csv') >>> portfolio [{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}] >>> # Read only some of the data >>> shares_held = parse_csv('Data/portfolio.csv', select=['name','shares']) >>> shares_held [{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}] >>>

练习 2.23 中给出了列选择器的示例。

然而，这里有一个方法可以做到这一点：

# fileparse.py import csv def parse_csv(filename, select=None): ''' Parse a CSV file into a list of records ''' with open(filename) as f: rows = csv.reader(f) # Read the file headers headers = next(rows) # If a column selector was given, find indices of the specified columns. # Also narrow the set of headers used for resulting dictionaries if select: indices = [headers.index(colname) for colname in select] headers = select else: indices = [] records = [] for row in rows: if not row: # Skip rows with no data continue # Filter the row if specific columns were selected if indices: row = [ row[index] for index in indices ] # Make a dictionary record = dict(zip(headers, row)) records.append(record) return records

这部分有一些棘手的问题，最重要的一个可能是列选择到行索引的映射。例如，假设输入文件具有以下标题：

>>> headers = ['name', 'date', 'time', 'shares', 'price'] >>>

现在，假设选定的列如下：

>>> select = ['name', 'shares'] >>>

为了执行正确的选择，必须将选择的列名映射到文件中的列索引。这就是该步骤正在执行的操作：

>>> indices = [headers.index(colname) for colname in select ] >>> indices [0, 3] >>>

换句话说，名称（"name" ）是第 0 列，股份数目（"shares" ）是第 3 列。

当从文件读取数据行的时候，使用索引对其进行过滤：

>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ] >>> row = [ row[index] for index in indices ] >>> row ['AA', '100'] >>> 练习 3.5：执行类型转换

修改 parse_csv() 函数，以便可以选择将类型转换应用到返回数据上。例如：

>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float]) >>> portfolio [{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}] >>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int]) >>> shares_held [{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}] >>>

转载注明出处：https://www.heiqu.com/wsxpxj.html

翻译：《实用的Python编程》03_02_More_functions (2)

相关推荐