书名：Practical Data Science Cookbook（Second Edition）
作者名：Prabhanjan Tattar Tony Ojeda Sean Patrick Murphy Benjamin Bengfort Abhijit Dasgupta
本章字数：1001字
更新时间：2025-04-04 19:03:20

How to do it...

The following steps will walk us through using the Jinja2 templating library to create flexible and appealing reporting output:

Jinja2 is simple and has familiar Python-esque syntax (though it is not Python). Templates can include logic or control flow, including iteration, conditionals, and formatting, which removes the need to have the data adapt to the template. A simple example is as follows:

In [36]: from jinja2 import Template 
    ...: template = Template(u'Greetings, {{ name }}!') 
    ...: template.render(name='Mr. Praline') 
Out[36]: 'Greetings, Mr. Praline!'

However, we should decouple our templates from our Python code, and instead, store the templates as text files on our system. Jinja2 provides a central Environment object, which is used to store configurations and global objects to load templates either from the filesystem or other Python packages:

In [37]: from jinja2 import Environment, PackageLoader, FileSystemLoader 
    ...: # 'templates' should be the path to the templates folder 
    ...: # as written, it is assumed to be in the current directory 
    ...: jinjaenv = Environment(loader = FileSystemLoader('templates')) 
    ...: template = jinjaenv.get_template('report.html')

Here, the Jinja2 environment is configured to look for template files in the templates directory of our Python module. Another recommended loader is FileSystemLoader, which should be provided as a search path to look for template files. In this case, the template called report.html is fetched from the Python module and is ready to be rendered.

Rendering can be as simple as template.render(context), which will return a unicode string of generated output. The context should be a Python dictionary whose keys are the variable names that will be used in the template. Alternatively, the context can be passed in as keyword arguments; template.render({'name':'Terry'}) is equivalent to template.render(name='Terry'). However, for large templates (and large datasets) it is far more efficient to use the template.stream method; it does not render the entire template at once, but evaluates each statement sequentially and yields it as a generator.

The stream can then be passed to a file-like object to be written to disk or serialized over the network:

template.stream(items=['a', 'b', 'c'],name='Eric').dump('report-2013.html')

This seemingly simple technique is incredibly powerful, especially when combined with the JSON module. JSON data can be dumped directly into JavaScript snippets for interactive charting and visualization libraries on the web, such as D3, Highcharts, and Google Charts.

Let's take a look at a complete example using the world's top incomes dataset:

In [38]: import csv 

    ...: import json 
    ...: from datetime import datetime 
    ...: from jinja2 import Environment, PackageLoader, FileSystemLoader 
    ...: from itertools import groupby 
    ...: from operator import itemgetter 
    ...:  

In [39]: def dataset(path, include): 
    ...: column = 'Average income per tax unit' 
    ...: with open(path, 'r') as csvfile: 
    ...: reader = csv.DictReader(csvfile) 
    ...: key = itemgetter('Country') 
    ...: # Use groupby: memory efficient collection by country 
    ...: for key, values in groupby(reader, key=key): 
    ...: # Only yield countries that are included 
    ...: if key in include: 
    ...: yield key, [(int(value['Year']), 
    ...: float(value[column])) 
    ...: for value in values if value[column]] 
    ...:  
    ...:  

In [40]: def extract_years(data): 
    ...: for country in data: 
    ...: for value in country[1]: 
    ...: yield value[0] 
    ...:  
    ...: datetime.now().strftime("%Y%m%d") 
    ...: jinjaenv = Environment(loader = FileSystemLoader('templates')) 
    ...: template = jinjaenv.get_template('report.html') 
    ...: template.stream(context).dump(path) 
    ...:  

In [41]: def write(context): 
    ...: path = "report-%s.html" %datetime.now().strftime("%Y%m%d") 
    ...: jinjaenv = Environment(loader = FileSystemLoader('templates')) 
    ...:     template = jinjaenv.get_template('report.html') 
    template.stream(context).dump(path) 
In [40]: def main(source): 
    ...: # Select countries to include 
    ...: include = ("United States", "France", "Italy", 
    ...: "Germany", "South Africa", "New Zealand") 
    ...: # Get dataset from CSV 
    ...: data = list(dataset(source, include)) 
    ...: years = set(extract_years(data)) 
    ...: # Generate context 
    ...: context = { 
    ...: 'title': "Average Income per Family, %i - %i" 
    ...: % (min(years), max(years)), 
    ...: 'years': json.dumps(list(years)), 
    ...: 'countries': [v[0] for v in data], 
    ...: 'series': json.dumps(list(extract_series(data, years))), 
    ...: } 
    ...:  
    ...:  

In [42]: write(context) 
    ...: if __name__ == '__main__': 
    ...: source = '../data/income_dist.csv' 
    ...: main(source)

This is a lot of code, so let's go through it step by step. The dataset function reads our CSV file for the Average income column and filters based on a set of included countries. It uses a functional iteration helper, groupby, which collects the rows of our CSV file by the country field, which means that we get a dataset per country. Both the itemgetter and groupby functions are common, memory-safe helper functions in Python that do a lot of heavy lifting during large-scale data analyses.

After we extract the dataset, we have two helper methods. The first, extract_years, generates all the year values from every country. This is necessary because not all countries have values for every year in the dataset. We'll also use this function to determine the range of years for our template. This brings us to the second function, extract_series, which normalizes the data, replacing empty years with None values to ensure our time series is correct.

The write() method wraps the template-writing functionality. It creates a file called report-{date}.html, adding the current date for reference. It also loads the Environment object, finds the report template, and writes the output to disk. Finally, the main method gathers all the data and context together and connects the functions.

The report template is as follows:

<html> 
<head> 
    <title>{{ title }}</title> 
</head> 
<body> 
    <div class="container"> 
        <h1>{{ title }}</h1> 
        <div id="countries"> 
            <ul> 
            {% for country in countries %} 
                <li>{{ country }}<li> 
            {% endfor %} 
            </ul>  
        </div> 
        <div id="chart"></div> 
    </div> 

    <script type="text/javascript" 
 src="http://codeorigin.jquery.com/jquery-
 2.0.3.min.js"></script> 
    <script src="http://code.highcharts.com/highcharts.js">
     </script> 
    <script type="text/javascript"> 
        $.noConflict(); 
        jQuery(document).ready(function($) { 
            $('#chart').highcharts({ 
                xAxis: { 
                    categories: JSON.parse('{{ years }}'), 
                    tickInterval: 5, 
                }, 
                yAxis: { 
                    title: { 
                        text: "2008 USD" 
                    } 
                }, 
                plotOptions: { 
                    line: { 
                        marker: { 
                            enabled: false 
                        } 
                    } 
                }, 
                series: JSON.parse('{{ series }}') 
            }); 
        }); 
    </script> 
</body> 
</html>

The preceding template fills in the title in the correct spot, and then creates an unordered list of the countries included in our dataset. Additionally, it uses Highcharts to create an interactive chart. Highcharts is an option-based, JavaScript chart library. Note that we're using JSON.parse to parse the JSON data that we dumped in Python. This will ensure that there are no conflicts when converting Python datatypes to JavaScript ones. When you open up the report in a browser, it should look something like the following screenshot: