Preface

"Data analysis is Python's killer app"

--Unknown

This book is the follow-up to Python Data Analysis. The obvious question is, "what does this new book add?" as Python Data Analysis is pretty great (or so I like to believe) already. This book, Python Data Analysis Cookbook, is targeted at slightly more experienced Pythonistas. A year has passed, so we are using newer versions of software and software libraries that I didn't cover in Python Data Analysis. Also, I've had time to rethink and research, and as a result I decided the following:

  • I need to have a toolbox in order to make my life easier and increase reproducibility. I called the toolbox dautil and made it available via PyPi (which can be installed with pip/easy_install).
  • My soul-searching exercise led me to believe that I need to make it easier to obtain and install the required software. I published a Docker container (pydacbk) with some of the software we need via DockerHub. You can read more about the setup in Chapter 1, Laying the Foundation for Reproducible Data Analysis, and the online chapter. The Docker container is not ideal because it grew quite large, so I had to make some tough decisions. Since the container is not really part of the book, I think it will be appropriate if you contact me directly if you have any issues. However, please keep in mind that I can't change the image drastically.
  • This book uses the IPython Notebook, which has become a standard tool for analysis. I have given some related tips in the online chapter and other books I have written.
  • I am using Python 3 with very few exceptions because Python 2 will not be maintained after 2020.