书名：Practical Data Science Cookbook（Second Edition）
作者名：Prabhanjan Tattar Tony Ojeda Sean Patrick Murphy Benjamin Bengfort Abhijit Dasgupta
本章字数：237字
更新时间：2025-04-04 19:03:20

Getting ready

In order to conduct our analyses, we're going to create a few helper methods that we will use continually throughout this chapter. Application-oriented analyses typically produce reusable code that performs singular tasks in order to adapt quickly to changing data or analysis requirements. In particular, let's create two helper functions: one that extracts data by a particular country and one that creates a time series from a set of particular rows:

In [22]: def dataset(path, country="United States"): 
    ...: """ 
    ...: Extract the data for the country provided. Default is United States. 
    ...: """ 
    ...: with open(path, 'r') as csvfile: 
    ...: reader = csv.DictReader(csvfile) 
    ...: for row in filter(lambda row: row["Country"]==country,reader): 
    ...: yield row 
    ...:  

In [23]: def timeseries(data, column): 
    ...: """ 
    ...: Creates a year based time series for the given column. 
    ...: """ 
    ...: for row in filter(lambda row: row[column], data): 
    ...: yield (int(row["Year"]), row[column]) 
    ...:

The first function iterates through the dataset using the csv.DictReader filter on a particular country using Python's built-in filter function. The second function leverages the fact that there is a Year column to create a time series for the data, a generator that yields (year, value) tuples for a particular column in the dataset. Note that this function should be passed in a generator created by the dataset function. We can now utilize these two functions for a series of analyses across any column for a single country.