Libros UCLV { BETA }

Gran cantidad de libros a nuestro alcance

Tenemos :
1413 libros,
263017 descargas y
1106 contribuyentes !

Se han encontrado 7 Coincidencias

Scaling CouchDB

Scaling CouchDB


25 Visitas | 31 Descargas | 2015-11-02 15:20:23 | pecarrazana

CouchDB is a schema-less database, giving you much flexibility in designing the document boundaries for your data

Machine Learning with Spark

Machine Learning with Spark


59 Visitas | 101 Descargas | 2015-11-02 15:59:02 | pecarrazana

In recent years, the volume of data being collected, stored, and analyzed has exploded, in particular in relation to the activity on the Web and mobile devices, as well as data from the physical world collected via sensor networks. While previously large-scale data storage, processing, analysis, and modeling was the domain of the largest institutions such as Google, Yahoo!, Facebook, and Twitter, increasingly, many organizations are being faced with the challenge of how to handle a massive amount of data. When faced with this quantity of data and the common requirement to utilize it in real time, human-powered systems quickly become infeasible. This has led to a rise in the so-called big data and machine learning systems that learn from this data to make automated decisions. In answer to the challenge of dealing with ever larger-scale data without any prohibitive cost, new open source technologies emerged at companies such as Google, Yahoo!, Amazon, and Facebook, which aimed at making it easier to handle massive data volumes by distributing data storage and computation across a cluster of computers. The most widespread of these is Apache Hadoop, which made it significantly easier and cheaper to both store large amounts of data (via the Hadoop Distributed File System, or HDFS) and run computations on this data (via Hadoop MapReduce, a framework to perform computation tasks in parallel across many nodes in a computer cluster).

Professional Hadoop Solutions

Hadoop Solutions


28 Visitas | 35 Descargas | 2015-11-13 20:47:47 | pecarrazana

It’s not that there is a lack of books about Hadoop. Quite a few have been written, and many of them are very good. So, why this one? Well, when the authors started working with Hadoop, we wished there was a book that went beyond APIs and explained how the many parts of the Hadoop ecosystem work together and can be used to build enterprise-grade solutions. We were looking for a book that walks the reader through the data design and how it impacts implementation, as well as explains how MapReduce works, and how to reformulate specific business problems in MapReduce.We were looking for answers to the following questions: ? What are MapReduce’s strengths and weaknesses, and how can you customize it to better suit your needs? ? Why do you need an additional orchestration layer on top of MapReduce, and how does Oozie fit the bill? ? How can you simplify MapReduce development using domain-specific languages (DSLs)? ? What is this real-time Hadoop that everyone is talking about, what can it do, and what canit not do? How does it work?

Apache Hive Essentials

Apache Hive


36 Visitas | 46 Descargas | 2015-11-13 20:52:40 | pecarrazana

Apache Hive Essentials prepares your journey to big data by covering the introduction of backgrounds and concepts in the big data domain along with the process of setting up and getting familiar with your Hive working environment in the first two chapters. In the next four chapters, the book guides you through discovering and transforming the value behind big data by examples and skills of Hive query languages. In the last four chapters, the book highlights well-selected and advanced topics, such as performance, security, and extensions as exciting adventures for this worthwhile big data journey.

Programming Hive

Apache Hive


35 Visitas | 54 Descargas | 2015-11-13 20:56:41 | pecarrazana

However, a challenge remains; how do you move an existing data infrastructure to Hadoop, when that infrastructure is based on traditional relational databases and the Structured Query Language (SQL)? What about the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses? This is where Hive comes in. Hive provides an SQL dialect, called Hive Query Lan- guage (abbreviated HiveQL or just HQL) for querying data stored in a Hadoop cluster. SQL knowledge is widespread for a reason; it’s an effective, reasonably intuitive model for organizing and using data. Mapping these familiar data operations to the low-level MapReduce Java API can be daunting, even for experienced Java developers. Hive does this dirty work for you, so you can focus on the query itself. Hive translates most queries to MapReduce jobs, thereby exploiting the scalability of Hadoop, while presenting a familiar SQL abstraction. If you don’t believe us, see “Java Versus Hive: The Word Count Algorithm” on page 10 later in this chapter.

Learning Apache Mahout

Apache Mahout


20 Visitas | 44 Descargas | 2015-11-27 16:40:00 | pecarrazana

Learning Apache Mahout is aimed at providing a strong foundation in machine learning using Mahout. This book is ideal for learning the core concepts of machine learning and the basics of Mahout. This book will go from the basics of Mahout and machine learning, to feature engineering and the implementation of various machine learning algorithms in Mahout. Algorithm usage examples will be explained using both the Mahout command line and its Java API. We will conclude the book with two chapters of end-to-end case studies. Ideally, chapters 1, 2 and 3 should be read sequentially, chapters 4 to 8 in any order, and chapters 9 and 10 after chapter 1 to 8 have been completed.

Pig Design Patterns

Pig Design Patterns


19 Visitas | 30 Descargas | 2015-11-27 16:46:36 | pecarrazana

The Pig platform surely is one of these methods. Nevertheless, the power of such a platform is best tapped by extending it efficiently. Extending requires great familiarity of the platform. More importantly, extending is fun when the process of building such extensions is easy. The Pig Latin platform offers great simplicity. However, a practitioner's advice is immensely valuable in leveraging this simplicity to an enterprise's own requirement. This is where I find this book to be very apt. It makes you productive with the platform pretty quickly through very well-researched design patterns. This helps simplify programming in Hadoop and create complex end-to-end enterprise-grade Big Data solutions through a building block and best-pattern approach. This book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, either in the form of a dashboard or a predictive model.