Libros UCLV { BETA }

Gran cantidad de libros a nuestro alcance

Tenemos :
1413 libros,
262755 descargas y
1106 contribuyentes !

Se han encontrado 16 Coincidencias

Hadoop: The Definitive Guide

Big Data


47 Visitas | 82 Descargas | 2015-04-07 21:20:52 | cbustillo

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing.

Field Guide to Hadoop

Big Data


35 Visitas | 71 Descargas | 2015-04-07 21:30:07 | cbustillo

QR code - Field Guide to Hadoop Book Description If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You'll quickly understand how Hadoop's projects, subprojects, and related technologies work together. Each chapter introduces a different topic - such as core technologies or data transfer - and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you'll have a good grasp of the playing field.

Advanced Analytics with Spark

Advanced Analytics with Spark


57 Visitas | 96 Descargas | 2015-11-02 15:54:02 | pecarrazana

In this practical book, four Cloudera data scientists present a set of self- contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection, among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

Fast Data Processing with Spark

Fast Data Processing with Spark


63 Visitas | 108 Descargas | 2015-11-02 15:56:32 | pecarrazana

Apache Spark has captured the imagination of the analytics and big data developers, and rightfully so. In a nutshell, Spark enables distributed computing on a large scale in the lab or in production. Till now, the pipeline collect-store-transform was distinct from the Data Science pipeline reason-model, which was again distinct from the deployment of the analytics and machine learning models. Now, with Spark and technologies, such as Kafka, we can seamlessly span the data management and data science pipelines. We can build data science models on larger datasets, requiring not just sample data. However, whatever models we build can be deployed into production (with added work from engineering on the "ilities", of course). It is our hope that this book would enable an engineer to get familiar with the fundamentals of the Spark platform as well as provide hands-on experience on some of the advanced capabilities.

Machine Learning with Spark

Machine Learning with Spark


58 Visitas | 101 Descargas | 2015-11-02 15:59:02 | pecarrazana

In recent years, the volume of data being collected, stored, and analyzed has exploded, in particular in relation to the activity on the Web and mobile devices, as well as data from the physical world collected via sensor networks. While previously large-scale data storage, processing, analysis, and modeling was the domain of the largest institutions such as Google, Yahoo!, Facebook, and Twitter, increasingly, many organizations are being faced with the challenge of how to handle a massive amount of data. When faced with this quantity of data and the common requirement to utilize it in real time, human-powered systems quickly become infeasible. This has led to a rise in the so-called big data and machine learning systems that learn from this data to make automated decisions. In answer to the challenge of dealing with ever larger-scale data without any prohibitive cost, new open source technologies emerged at companies such as Google, Yahoo!, Amazon, and Facebook, which aimed at making it easier to handle massive data volumes by distributing data storage and computation across a cluster of computers. The most widespread of these is Apache Hadoop, which made it significantly easier and cheaper to both store large amounts of data (via the Hadoop Distributed File System, or HDFS) and run computations on this data (via Hadoop MapReduce, a framework to perform computation tasks in parallel across many nodes in a computer cluster).

Learning Neo4j

Learning Neo4j. Run blazingly fast queries on complex graph datasets with the power of the Neo4j graph database


38 Visitas | 43 Descargas | 2015-11-05 19:49:40 | pecarrazana

Who this book is for: If you are an IT professional or developer who wants to get started in the field of graph databases, this is the book for you. Anyone with prior experience with SQL in the relational database world will very quickly feel at ease with Neo4j and its Cypher query language and learn a lot from this book.

Apache HBase Reference Guide

Apache HBase


16 Visitas | 30 Descargas | 2015-11-10 20:19:12 | pecarrazana

This is the official reference guide for the HBase version it ships with. Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location in Javadoc or JIRA where the pertinent information can be found.

Professional Hadoop Solutions

Hadoop Solutions


27 Visitas | 35 Descargas | 2015-11-13 20:47:47 | pecarrazana

It’s not that there is a lack of books about Hadoop. Quite a few have been written, and many of them are very good. So, why this one? Well, when the authors started working with Hadoop, we wished there was a book that went beyond APIs and explained how the many parts of the Hadoop ecosystem work together and can be used to build enterprise-grade solutions. We were looking for a book that walks the reader through the data design and how it impacts implementation, as well as explains how MapReduce works, and how to reformulate specific business problems in MapReduce.We were looking for answers to the following questions: ? What are MapReduce’s strengths and weaknesses, and how can you customize it to better suit your needs? ? Why do you need an additional orchestration layer on top of MapReduce, and how does Oozie fit the bill? ? How can you simplify MapReduce development using domain-specific languages (DSLs)? ? What is this real-time Hadoop that everyone is talking about, what can it do, and what canit not do? How does it work?

Learning Apache Mahout

Apache Mahout


20 Visitas | 43 Descargas | 2015-11-27 16:40:00 | pecarrazana

Learning Apache Mahout is aimed at providing a strong foundation in machine learning using Mahout. This book is ideal for learning the core concepts of machine learning and the basics of Mahout. This book will go from the basics of Mahout and machine learning, to feature engineering and the implementation of various machine learning algorithms in Mahout. Algorithm usage examples will be explained using both the Mahout command line and its Java API. We will conclude the book with two chapters of end-to-end case studies. Ideally, chapters 1, 2 and 3 should be read sequentially, chapters 4 to 8 in any order, and chapters 9 and 10 after chapter 1 to 8 have been completed.

Learning Apache Mahout Classification

Learning Apache Mahout Classification


19 Visitas | 33 Descargas | 2015-11-27 16:43:25 | pecarrazana

If you are a data scientist who has some experience with the Hadoop ecosystem and machine learning methods and want to try out classification on large datasets using Mahout, this book is ideal for you. Knowledge of Java is essential. To use the examples in this book, you should have the following software installed on your system: Java 1.6 or higher Eclipse Hadoop Mahout; we will discuss the installation in Chapter 2, Apache Mahout, of this book Maven, depending on how you install Mahout

Pig Design Patterns

Pig Design Patterns


18 Visitas | 30 Descargas | 2015-11-27 16:46:36 | pecarrazana

The Pig platform surely is one of these methods. Nevertheless, the power of such a platform is best tapped by extending it efficiently. Extending requires great familiarity of the platform. More importantly, extending is fun when the process of building such extensions is easy. The Pig Latin platform offers great simplicity. However, a practitioner's advice is immensely valuable in leveraging this simplicity to an enterprise's own requirement. This is where I find this book to be very apt. It makes you productive with the platform pretty quickly through very well-researched design patterns. This helps simplify programming in Hadoop and create complex end-to-end enterprise-grade Big Data solutions through a building block and best-pattern approach. This book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, either in the form of a dashboard or a predictive model.

Programming Pig

Programming Pig


25 Visitas | 35 Descargas | 2015-11-27 16:48:20 | pecarrazana

This book is intended for Pig programmers, new and old. Those who have never used Pig will find introductory material on how to run Pig and to get them started writing Pig Latin scripts. For seasoned Pig users, this book covers almost every feature of Pig: different modes it can be run in, complete coverage of the Pig Latin language, and how to extend Pig with your own User Defined Functions (UDFs). Even those who have been using Pig for a long time are likely to discover features they have not used before. Being a relatively young project, Pig has changed and grown significantly over the last four years. In that time we have released versions 0.1 through 0.9. This book assumes Pig 0.7 as the base version. Wherever features are only in versions 0.8 or 0.9, this is called out. The biggest change from 0.6 to 0.7 is that load and store function interfaces were rewritten, so Chapter 11 will not be usable by those on 0.6 or earlier versions. However, the rest of the book will still be applicable. Some knowledge of Hadoop will be useful for readers and Pig users. Appendix B pro- vides an introduction to Hadoop and how it works. “Pig on Hadoop” on page 1 walks through a very simple example of a Hadoop job. These sections will be helpful for those not already familiar with Hadoop. Small snippets of Java, Python, and SQL are used in parts of this book. Knowledge of these languages is not required to use Pig, but knowledge of Python and Java will be necessary for some of the more advanced features. Those with a SQL background may find “Comparing query and dataflow languages” on page 4 to be a helpful starting point in understanding the similarities and differences between Pig Latin and SQL.

ZooKeeper

Zookeeper


21 Visitas | 35 Descargas | 2015-11-27 16:51:48 | pecarrazana

Building distributed systems is hard. A lot of the applications people use daily, however, depend on such systems, and it doesn’t look like we will stop relying on distributed computer systems any time soon. Apache ZooKeeper has been designed to mitigate the task of building robust distributed systems. It has been built around core distributed computing concepts, with its main goal to present the developer with an interface that is simple to understand and program against, thus simplifying the task of building such systems. Even with ZooKeeper, the task is not trivial—which leads us to this book. This book will get you up to speed on building distributed systems with Apache ZooKeeper. We start with basic concepts that will quickly make you feel like you’re a distributed systems expert. Perhaps it will be a bit disappointing to see that it is not that simple when we discuss a bunch of caveats that you need to be aware of. But don’t worry; if you under? stand well the key issues we expose, you’ll be on the right track to building great dis? tributed applications.

Machine Learning in Action

Machine Learning


101 Visitas | 158 Descargas | 2015-12-11 18:24:04 | pecarrazana

This book sets out to introduce people to important machine learning algorithms. Tools and applications using these algorithms are introduced to give the reader an idea of how they are used in practice today. A wide selection of machine learning books is available, which discuss the mathematics, but discuss little of how to program the algorithms. This book aims to be a bridge from algorithms presented in matrix form to an actual functioning program. With that in mind, please note that this book is heavy on code and light on mathematics. Audience What is all this machine learning stuff and who needs it? In a nutshell, machine learning is making sense of data. So if you have data you want to understand, this book is for you. If you want to get data and make sense of it, then this book is for you too. It helps if you are familiar with a few basic programming concepts, such as recursion and a few data structures, such as trees. It will also help if you have had an introduction to linear algebra and probability, although expertise in these fields is not necessary to benefit from this book. Lastly, the book uses Python, which has been called “executable pseudo code” in the past. It is assumed that you have a basic working knowledge of Python, but do not worry if you are not an expert in Python— it is not difficult to learn.

Learning Apache Kafka 2dn Ed

This book is for those who want to know about Apache Kafka at a hands-on level; the key audience is those with software development experience but no prior exposure to Apache Kafka or similar technologies. This book is also for enterprise application developers and big data enthusiasts who have worked with other publisher-subscriber-based systems and now want to explore Apache Kafka as a futuristic scalable solution.


15 Visitas | 11 Descargas | 2019-05-07 11:37:55 | moliver

In today’s world, real-time information is continuously being generated by applications (business, social, or any other type), and this information needs easy ways to be reliably and quickly routed to multiple types of receivers. Most of the time, applications that produce information and applications that are consuming this information are well apart and inaccessible to each other. These heterogeneous application leads to redevelopment for providing an integration point between them. Therefore, a mechanism is required for the seamless integration of information from producers and consumers to avoid any kind of application rewriting at either end. Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Kafka: The Definitive Guide

It’s an exciting time for Apache Kafka. Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. It’s among the fastest growing open source projects and has spawned an immense ecosystem around it. It’s at the heart of a movement towards managing and processing streams of data.


13 Visitas | 12 Descargas | 2019-05-07 11:43:58 | moliver

Kafka got its start as an internal infrastructure system we built at LinkedIn. Our observation was really simple: there were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us to handle the continuous flow of data. Prior to building Kafka, we experimented with all kinds of off the shelf options; from messaging systems to log aggregation and ETL tools, but none of them gave us what we wanted. We eventually decided to build something from scratch. Our idea was that instead of focusing on holding piles of data like our relational databases, key-value stores, search indexes, or caches, we would focus on treating data as a continually evolving and ever growing stream, and build a data system—and indeed a data architecture—oriented around that idea.