Book Review: Data Science at the Command Line By Jeroen Janssens

Book Review: Data Science at the Command Line By Jeroen Janssens

A book review of Jeroen Janssens’ latest Data Science at the Command Line
Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools written by Jeroen Janssens is the second edition of the series “Data Science at the Command Line”. This book demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You will learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 80 tools–useful whether you work with Windows, macOS, or Linux.

You will quickly discover why the command line is an agile, scalable, and extensible technology. Even if you are comfortable processing data with Python or R, you will learn how to greatly improve your data science workflow by leveraging the command line’s power. This book is ideal for data scientists, analysts, and engineers; software and machine learning engineers; and system administrators.

You will learn how to obtain data from websites, APIs, databases, and spreadsheets, perform scrub operations on text, CSV, HTM, XML, and JSON files, explore data, compute descriptive statistics, and create visualizations, manage your data science workflow, create reusable command-line tools from one-liners and existing Python or R code, parallelize and distribute data-intensive pipelines, model data with dimensionality reduction, clustering, regression, and classification algorithms.

This book is really good for students, young data scientists, analysts, and those who have an interest in the data science field. It is well-written, if you want to understand more about data science, this is a great place to start. You can expect great work from someone who is teaching data science for long enough to know the in-outs of the technology. Jeroen Janssens is also the author of Data Science at the Command Line, published by O’Reilly Media.