书名：Expert Data Visualization
作者名：Jos Dirksen
本章字数：269字
更新时间：2025-04-04 19:31:10

Sanitizing and getting the data

For this example, we'll download data from https://www.ssa.gov/oact/babynames/limits.html. This site provides data for all the baby names in the US since 1880. On this page, you can find national data and state-specific data. For this example, download the national data dataset. Once you've downloaded it, you can extract it, and you'll see data for a lot of different years:

$ ls -1 
NationalReadMe.pdf 
yob1880.txt 
yob1881.txt 
yob1882.txt 
yob1883.txt 
yob1884.txt 
yob1885.txt 
... 
yob2013.txt 
yob2014.txt 
yob2015.txt

As you can see, we have data from 1880 until 2015. For this example, I've used the data from 2015, but you can use pretty much anything you want. Now let's look a bit closer at the data:

$ cat yob2015.txt 
Emma,F,20355 
Olivia,F,19553 
Sophia,F,17327 
Ava,F,16286 
Isabella,F,15504 
Mia,F,14820 
Abigail,F,12311 
Emily,F,11727 
Charlotte,F,11332 
Harper,F,10241 
... 
Zynique,F,5 
Zyrielle,F,5 
Noah,M,19511 
Liam,M,18281 
Mason,M,16535 
Jacob,M,15816 
William,M,15809 
Ethan,M,14991 
James,M,14705 
Alexander,M,14460 
Michael,M,14321 
Benjamin,M,13608 
Elijah,M,13511 
Daniel,M,13408

In this data, we've got a large number of rows where each row shows the name and the sex (M or F). First, all the girls' names are shown, and after that all the boys' names are shown. The data in itself already looks pretty usable, so we don't need to do much processing before we can use it. The only thing, though, we do is add a header to this file, so that it looks like this:

name,sex,amount 
Emma,F,20355 
Olivia,F,19553 
Sophia,F,17327 
Ava,F,16286

This will make parsing this data into D3 a little bit easier, since the default way of parsing CSV data with D3 assumes the first line is a header. The sanitized data we use in this example can be found here: <DVD3>/src/chapter-01/data/yob2015.txt.