Python for Data Science For Dummies

Unleash the power of Python for your data analysis projects with For Dummies!

Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You'll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide.
* Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models
* Explains objects, functions, modules, and libraries and their role in data analysis
* Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib

Whether you're new to data analysis or just new to Python, Python for Data Science For Dummies is your practical guide to getting a grip on data overload and doing interesting things with the oodles of information you uncover.
John Paul Mueller, consultant, application developer, writer, and technical editor, has written over 600 articles and 97 books. His topics range from programming to home security. Luca Massaron is a data scientist and a research director specializing in multivariate statistical analysis, machine learning, and customer insight. He is a pioneer of Web audience analysis in Italy and was named one of the top ten data scientists at competitions by kaggle.com.
… weiterlesen
  • Artikelbild-0
  • Introduction 1

    About This Book 1

    Foolish Assumptions 2

    Icons Used in This Book 3

    Beyond the Book 4

    Where to Go from Here 5

    Part I: Getting Started with Python for Data Science 7

    Chapter 1: Discovering the Match between Data Science and Python 9

    Defining the Sexiest Job of the 21st Century 11

    Considering the emergence of data science 11

    Outlining the core competencies of a data scientist 12

    Linking data science and big data 13

    Understanding the role of programming 13

    Creating the Data Science Pipeline 14

    Preparing the data 14

    Performing exploratory data analysis 15

    Learning from data 15

    Visualizing 15

    Obtaining insights and data products 15

    Understanding Python's Role in Data Science 16

    Considering the shifting profile of data scientists 16

    Working with a multipurpose, simple, and efficient language 17

    Learning to Use Python Fast 18

    Loading data 18

    Training a model 18

    Viewing a result 20

    Chapter 2: Introducing Python's Capabilities and Wonders 21

    Why Python? 22

    Grasping Python's core philosophy 23

    Discovering present and future development goals 23

    Working with Python 24

    Getting a taste of the language 24

    Understanding the need for indentation 25

    Working at the command line or in the IDE 25

    Performing Rapid Prototyping and Experimentation 29

    Considering Speed of Execution 30

    Visualizing Power 32

    Using the Python Ecosystem for Data Science 33

    Accessing scientific tools using SciPy 33

    Performing fundamental scientific computing using NumPy 34

    Performing data analysis using pandas 34

    Implementing machine learning using Scikit?]learn 35

    Plotting the data using matplotlib 35

    Parsing HTML documents using Beautiful Soup 35

    Chapter 3: Setting Up Python for Data Science 37

    Considering the Off?]the?]Shelf Cross?]Platform Scientific Distributions 38

    Getting Continuum Analytics Anaconda 39

    Getting Enthought Canopy Express 40

    Getting pythonxy 40

    Getting WinPython 41

    Installing Anaconda on Windows 41

    Installing Anaconda on Linux 45

    Installing Anaconda on Mac OS X 46

    Downloading the Datasets and Example Code 47

    Using IPython Notebook 47

    Defining the code repository 48

    Understanding the datasets used in this book 54

    Chapter 4: Reviewing Basic Python 57

    Working with Numbers and Logic 59

    Performing variable assignments 60

    Doing arithmetic 61

    Comparing data using Boolean expressions 62

    Creating and Using Strings 65

    Interacting with Dates 66

    Creating and Using Functions 68

    Creating reusable functions 68

    Calling functions in a variety of ways 70

    Using Conditional and Loop Statements 73

    Making decisions using the if statement 73

    Choosing between multiple options using nested decisions 74

    Performing repetitive tasks using for 75

    Using the while statement 76

    Storing Data Using Sets, Lists, and Tuples 77

    Performing operations on sets 77

    Working with lists 78

    Creating and using Tuples 80

    Defining Useful Iterators 81

    Indexing Data Using Dictionaries 82

    Part II: Getting Your Hands Dirty with Data 83

    Chapter 5: Working with Real Data 85

    Uploading, Streaming, and Sampling Data 86

    Uploading small amounts of data into memory 87

    Streaming large amounts of data into memory 88

    Sampling data 89

    Accessing Data in Structured Flat?]File Form 90

    Reading from a text file 91

    Reading CSV delimited format 92

    Reading Excel and other Microsoft Office files 94

    Sending Data in Unstructured File Form 95

    Managing Data from Relational Databases 98

    Interacting with Data from NoSQL Databases 100

    Accessing Data from the Web 101

    Chapter 6: Conditioning Your Data 105

    Juggling between NumPy and pandas 106

    Knowing when to use NumPy 106

    Knowing when to use pandas 106

    Validating Your Data 107

    Figuring out what's in your data 108

    Removing duplicates 109

    Creating a data map and data plan 110

    Manipulating Categorical Variables 112

    Creating categorical variables 113

    Renaming levels 114

    Combining levels 115

    Dealing with Dates in Your Data 116

    Formatting date and time values 117

    Using the right time transformation 117

    Dealing with Missing Data 118

    Finding the missing data 119

    Encoding missingness 119

    Imputing missing data 120

    Slicing and Dicing: Filtering and Selecting Data 122

    Slicing rows 122

    Slicing columns 123

    Dicing 123

    Concatenating and Transforming 124

    Adding new cases and variables 125

    Removing data 126

    Sorting and shuffling 127

    Aggregating Data at Any Level 128

    Chapter 7: Shaping Data 131

    Working with HTML Pages 132

    Parsing XML and HTML 132

    Using XPath for data extraction 133

    Working with Raw Text 134

    Dealing with Unicode 134

    Stemming and removing stop words 136

    Introducing regular expressions 137

    Using the Bag of Words Model and Beyond 140

    Understanding the bag of words model 141

    Working with n?]grams 142

    Implementing TF?]IDF transformations 144

    Working with Graph Data 145

    Understanding the adjacency matrix 146

    Using NetworkX basics 146

    Chapter 8: Putting What You Know in Action 149

    Contextualizing Problems and Data 150

    Evaluating a data science problem 151

    Researching solutions 151

    Formulating a hypothesis 152

    Preparing your data 153

    Considering the Art of Feature Creation 153

    Defining feature creation 153

    Combining variables 154

    Understanding binning and discretization 155

    Using indicator variables 155

    Transforming distributions 156

    Performing Operations on Arrays 156

    Using vectorization 157

    Performing simple arithmetic on vectors and matrices 157

    Performing matrix vector multiplication 158

    Performing matrix multiplication 159

    Part III: Visualizing the Invisible 161

    Chapter 9: Getting a Crash Course in MatPlotLib 163

    Starting with a Graph 164

    Defining the plot 164

    Drawing multiple lines and plots 165

    Saving your work 165

    Setting the Axis, Ticks, Grids 166

    Getting the axes 167

    Formatting the axes 167

    Adding grids 168

    Defining the Line Appearance 169

    Working with line styles 170

    Using colors 170

    Adding markers 172

    Using Labels, Annotations, and Legends 173

    Adding labels 174

    Annotating the chart 174

    Creating a legend 175

    Chapter 10: Visualizing the Data 179

    Choosing the Right Graph 180

    Showing parts of a whole with pie charts 180

    Creating comparisons with bar charts 181

    Showing distributions using histograms 183

    Depicting groups using box plots 184

    Seeing data patterns using scatterplots 185

    Creating Advanced Scatterplots 187

    Depicting groups 187

    Showing correlations 188

    Plotting Time Series 189

    Representing time on axes 190

    Plotting trends over time 191

    Plotting Geographical Data 193

    Visualizing Graphs 195

    Developing undirected graphs 195

    Developing directed graphs 197

    Chapter 11: Understanding the Tools 199

    Using the IPython Console 200

    Interacting with screen text 200

    Changing the window appearance 202

    Getting Python help 203

    Getting IPython help 205

    Using magic functions 205

    Discovering objects 207

    Using IPython Notebook 208

    Working with styles 208

    Restarting the kernel 210

    Restoring a checkpoint 210

    Performing Multimedia and Graphic Integration 212

    Embedding plots and other images 212

    Loading examples from online sites 212

    Obtaining online graphics and multimedia 212

    Part IV: Wrangling Data 215

    Chapter 12: Stretching Python's Capabilities 217

    Playing with Scikit?]learn 218

    Understanding classes in Scikit?]learn 218

    Defining applications for data science 219

    Performing the Hashing Trick 222

    Using hash functions 223

    Demonstrating the hashing trick 223

    Working with deterministic selection 225

    Considering Timing and Performance 227

    Benchmarking with timeit 228

    Working with the memory profiler 230

    Running in Parallel 232

    Performing multicore parallelism 232

    Demonstrating multiprocessing 233

    Chapter 13: Exploring Data Analysis 235

    The EDA Approach 236

    Defining Descriptive Statistics for Numeric Data 237

    Measuring central tendency 238

    Measuring variance and range 239

    Working with percentiles 239

    Defining measures of normality 240

    Counting for Categorical Data 241

    Understanding frequencies 242

    Creating contingency tables 243

    Creating Applied Visualization for EDA 243

    Inspecting boxplots 244

    Performing t?]tests after boxplots 245

    Observing parallel coordinates 246

    Graphing distributions 247

    Plotting scatterplots 248

    Understanding Correlation 250

    Using covariance and correlation 250

    Using nonparametric correlation 252

    Considering chi?]square for tables 253

    Modifying Data Distributions 253

    Using the normal distribution 254

    Creating a Z?]score standardization 254

    Transforming other notable distributions 254

    Chapter 14: Reducing Dimensionality 257

    Understanding SVD 258

    Looking for dimensionality reduction 259

    Using SVD to measure the invisible 260

    Performing Factor and Principal Component Analysis 261

    Considering the psychometric model 262

    Looking for hidden factors 262

    Using components, not factors 263

    Achieving dimensionality reduction 264

    Understanding Some Applications 264

    Recognizing faces with PCA 265

    Extracting Topics with NMF 267

    Recommending movies 270

    Chapter 15: Clustering 273

    Clustering with K?]means 275

    Understanding centroid?]based algorithms 275

    Creating an example with image data 277

    Looking for optimal solutions 278

    Clustering big data 281

    Performing Hierarchical Clustering 282

    Moving Beyond the Round-Shaped Clusters: DBScan 286

    Chapter 16: Detecting Outliers in Data 289

    Considering Detection of Outliers 290

    Finding more things that can go wrong 291

    Understanding anomalies and novel data 292

    Examining a Simple Univariate Method 292

    Leveraging on the Gaussian distribution 294

    Making assumptions and checking out 295

    Developing a Multivariate Approach 296

    Using principal component analysis 297

    Using cluster analysis 298

    Automating outliers detection with SVM 299

    Part V: Learning from Data 301

    Chapter 17: Exploring Four Simple and Effective Algorithms 303

    Guessing the Number: Linear Regression 304

    Defining the family of linear models 304

    Using more variables 305

    Understanding limitations and problems 307

    Moving to Logistic Regression 307

    Applying logistic regression 308

    Considering when classes are more 309

    Making Things as Simple as Naïve Bayes 310

    Finding out that Naïve Bayes isn't so naïve 312

    Predicting text classifications 313

    Learning Lazily with Nearest Neighbors 315

    Predicting after observing neighbors 316

    Choosing your k parameter wisely 317

    Chapter 18: Performing Cross?]Validation, Selection, and Optimization 319

    Pondering the Problem of Fitting a Model 320

    Understanding bias and variance 321

    Defining a strategy for picking models 322

    Dividing between training and test sets 325

    Cross?]Validating 328

    Using cross?]validation on k folds 329

    Sampling stratifications for complex data 329

    Selecting Variables Like a Pro 331

    Selecting by univariate measures 331

    Using a greedy search 333

    Pumping Up Your Hyperparameters 334

    Implementing a grid search 335

    Trying a randomized search 339

    Chapter 19: Increasing Complexity with Linear and Nonlinear Tricks 341

    Using Nonlinear Transformations 341

    Doing variable transformations 342

    Creating interactions between variables 344

    Regularizing Linear Models 348

    Relying on Ridge regression (L2)349

    Using the Lasso (L1) 349

    Leveraging regularization 350

    Combining L1 & L2: Elasticnet 350

    Fighting with Big Data Chunk by Chunk 351

    Determining when there is too much data 351

    Implementing Stochastic Gradient Descent 351

    Understanding Support Vector Machines 354

    Relying on a computational method 355

    Fixing many new parameters 358

    Classifying with SVC 360

    Going nonlinear is easy 365

    Performing regression with SVR 366

    Creating a stochastic solution with SVM 368

    Chapter 20: Understanding the Power of the Many 373

    Starting with a Plain Decision Tree 374

    Understanding a decision tree 374

    Creating classification and regression trees 376

    Making Machine Learning Accessible 379

    Working with a Random Forest classifier 381

    Working with a Random Forest regressor 382

    Optimizing a Random Forest 383

    Boosting Predictions 384

    Knowing that many weak predictors win 384

    Creating a gradient boosting classifier 385

    Creating a gradient boosting regressor 386

    Using GBM hyper?]parameters 387

    Part VI: The Part of Tens 389

    Chapter 21: Ten Essential Data Science Resource Collections 391

    Gaining Insights with Data Science Weekly 392

    Obtaining a Resource List at U Climb Higher 392

    Getting a Good Start with KDnuggets 392

    Accessing the Huge List of Resources on Data Science Central 393

    Obtaining the Facts of Open Source Data Science from Masters 394

    Locating Free Learning Resources with Quora 394

    Receiving Help with Advanced Topics at Conductrics 394

    Learning New Tricks from the Aspirational Data Scientist 395

    Finding Data Intelligence and Analytics Resources at AnalyticBridge 396

    Zeroing In on Developer Resources with Jonathan Bower 396

    Chapter 22: Ten Data Challenges You Should Take 397

    Meeting the Data Science London + Scikit?]learn Challenge 398

    Predicting Survival on the Titanic 399

    Finding a Kaggle Competition that Suits Your Needs 399

    Honing Your Overfit Strategies 400

    Trudging Through the MovieLens Dataset 401

    Getting Rid of Spam Emails 401

    Working with Handwritten Information 402

    Working with Pictures 403

    Analyzing Amazon.com Reviews 404

    Interacting with a Huge Graph 405

    Index 407
In den Warenkorb



Einband Taschenbuch
Seitenzahl 432
Erscheinungsdatum 21.07.2015
Sprache Englisch
ISBN 978-1-118-84418-2
Reihe For Dummies
Verlag John Wiley & Sons, Inc.
Maße (L/B/H) 23.4/18.9/2.5 cm
Gewicht 586 g
Abbildungen mit Illustrationen
Auflage 1. Auflage
Buch (Taschenbuch, Englisch)
Buch (Taschenbuch, Englisch)
Fr. 44.90
Fr. 44.90
inkl. gesetzl. MwSt.
inkl. gesetzl. MwSt.
Versandfertig innert 3 Wochen Versandkostenfrei
Versandfertig innert 3 Wochen
In den Warenkorb
Vielen Dank für Ihr Feedback!
Entschuldigung, beim Absenden Ihres Feedbacks ist ein Fehler passiert. Bitte versuchen Sie es erneut.
Ihr Feedback zur Seite
Haben Sie alle relevanten Informationen erhalten?
Ihr Feedback ist anonym. Wir nutzen es, um unsere Produktseiten zu verbessern. Bitte haben Sie Verständnis, dass wir Ihnen keine Rückmeldung geben können. Wenn Sie Kontakt mit uns aufnehmen möchten, können Sie sich aber gerne an unseren Kundenservice wenden.


Es wurden noch keine Bewertungen geschrieben.