Apache Hive
However, a challenge remains; how do you move an existing data infrastructure to Hadoop, when that infrastructure is based on traditional relational databases and the Structured Query Language (SQL)? What about the large base of SQL users, both expert database designers and administrators, as well as casual users who use SQL to extract information from their data warehouses? This is where Hive comes in. Hive provides an SQL dialect, called Hive Query Lan- guage (abbreviated HiveQL or just HQL) for querying data stored in a Hadoop cluster. SQL knowledge is widespread for a reason; it’s an effective, reasonably intuitive model for organizing and using data. Mapping these familiar data operations to the low-level MapReduce Java API can be daunting, even for experienced Java developers. Hive does this dirty work for you, so you can focus on the query itself. Hive translates most queries to MapReduce jobs, thereby exploiting the scalability of Hadoop, while presenting a familiar SQL abstraction. If you don’t believe us, see “Java Versus Hive: The Word Count Algorithm” on page 10 later in this chapter.
HTML5 & CSS3
This book is aimed at web designers and front-end developers who want to learn about the latest generation of browser-based technologies. You should already have at least intermediate knowledge of HTML and CSS, as we won’t be spending any time covering the basics of markup and styles. Instead, we’ll focus on teaching you what new powers are available to you in the form of HTML5 and CSS3. The final two chapters of this book cover some of the new JavaScript APIs that have come to be associated with HTML5. These chapters, of course, require some basic familiarity with JavaScript—but they’re not critical to the rest of the book. If you’re unfamiliar with JavaScript, there’s no harm in skipping over them for now, returning later when you’re better acquainted with it.
Natural Language Processing with Java
What this book covers Chapter 1, Introduction to NLP, explains the importance and uses of NLP. The NLP techniques used in this chapter are explained with simple examples illustrating their use. Chapter 2, Finding Parts of Text, focuses primarily on tokenization. This is the first step in more advanced NLP tasks. Both core Java and Java NLP tokenization APIs are illustrated. Chapter 3, Finding Sentences, proves that sentence boundary disambiguation is an important NLP task. This step is a precursor for many other downstream NLP tasks where text elements should not be split across sentence boundaries. This includes ensuring that all phrases are in one sentence and supporting parts of speech analysis. Chapter 4, Finding People and Things, covers what is commonly referred to as Named Entity Recognition. This task is concerned with identifying people, places, and similar entities in text. This technique is a preliminary step for processing queries and searches. Chapter 5, Detecting Parts of Speech, shows you how to detect parts of speech, which are grammatical elements of text, such as nouns and verbs. Identifying these elements is a significant step in determining the meaning of text and detecting relationships within text. Chapter 6, Classifying Texts and Documents, proves that classifying text is useful for tasks such as spam detection and sentiment analysis. The NLP techniques that support this process are investigated and illustrated. Chapter 7, Using Parser to Extract Relationships, demonstrates parse trees. A parse tree is used for many purposes, including information extraction. It holds information regarding the relationships between these elements. An example implementing a simple query is presented to illustrate this process. Chapter 8, Combined Approaches, contains techniques for extracting data from various types of documents, such as PDF and Word files. This is followed by an examination of how the previous NLP techniques can be combined into a pipeline to solve larger problems.
Apache Mahout
Learning Apache Mahout is aimed at providing a strong foundation in machine learning using Mahout. This book is ideal for learning the core concepts of machine learning and the basics of Mahout. This book will go from the basics of Mahout and machine learning, to feature engineering and the implementation of various machine learning algorithms in Mahout. Algorithm usage examples will be explained using both the Mahout command line and its Java API. We will conclude the book with two chapters of end-to-end case studies. Ideally, chapters 1, 2 and 3 should be read sequentially, chapters 4 to 8 in any order, and chapters 9 and 10 after chapter 1 to 8 have been completed.
Learning Apache Mahout Classification
If you are a data scientist who has some experience with the Hadoop ecosystem and machine learning methods and want to try out classification on large datasets using Mahout, this book is ideal for you. Knowledge of Java is essential. To use the examples in this book, you should have the following software installed on your system: Java 1.6 or higher Eclipse Hadoop Mahout; we will discuss the installation in Chapter 2, Apache Mahout, of this book Maven, depending on how you install Mahout
Pig Design Patterns
The Pig platform surely is one of these methods. Nevertheless, the power of such a platform is best tapped by extending it efficiently. Extending requires great familiarity of the platform. More importantly, extending is fun when the process of building such extensions is easy. The Pig Latin platform offers great simplicity. However, a practitioner's advice is immensely valuable in leveraging this simplicity to an enterprise's own requirement. This is where I find this book to be very apt. It makes you productive with the platform pretty quickly through very well-researched design patterns. This helps simplify programming in Hadoop and create complex end-to-end enterprise-grade Big Data solutions through a building block and best-pattern approach. This book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, either in the form of a dashboard or a predictive model.
Programming Pig
This book is intended for Pig programmers, new and old. Those who have never used Pig will find introductory material on how to run Pig and to get them started writing Pig Latin scripts. For seasoned Pig users, this book covers almost every feature of Pig: different modes it can be run in, complete coverage of the Pig Latin language, and how to extend Pig with your own User Defined Functions (UDFs). Even those who have been using Pig for a long time are likely to discover features they have not used before. Being a relatively young project, Pig has changed and grown significantly over the last four years. In that time we have released versions 0.1 through 0.9. This book assumes Pig 0.7 as the base version. Wherever features are only in versions 0.8 or 0.9, this is called out. The biggest change from 0.6 to 0.7 is that load and store function interfaces were rewritten, so Chapter 11 will not be usable by those on 0.6 or earlier versions. However, the rest of the book will still be applicable. Some knowledge of Hadoop will be useful for readers and Pig users. Appendix B pro- vides an introduction to Hadoop and how it works. “Pig on Hadoop” on page 1 walks through a very simple example of a Hadoop job. These sections will be helpful for those not already familiar with Hadoop. Small snippets of Java, Python, and SQL are used in parts of this book. Knowledge of these languages is not required to use Pig, but knowledge of Python and Java will be necessary for some of the more advanced features. Those with a SQL background may find “Comparing query and dataflow languages” on page 4 to be a helpful starting point in understanding the similarities and differences between Pig Latin and SQL.
Zookeeper
Building distributed systems is hard. A lot of the applications people use daily, however, depend on such systems, and it doesn’t look like we will stop relying on distributed computer systems any time soon. Apache ZooKeeper has been designed to mitigate the task of building robust distributed systems. It has been built around core distributed computing concepts, with its main goal to present the developer with an interface that is simple to understand and program against, thus simplifying the task of building such systems. Even with ZooKeeper, the task is not trivial—which leads us to this book. This book will get you up to speed on building distributed systems with Apache ZooKeeper. We start with basic concepts that will quickly make you feel like you’re a distributed systems expert. Perhaps it will be a bit disappointing to see that it is not that simple when we discuss a bunch of caveats that you need to be aware of. But don’t worry; if you under? stand well the key issues we expose, you’ll be on the right track to building great dis? tributed applications.
Distributed Graph Algorithms for Computer Networks
Distributed systems consisting of a number of autonomous computing elements con- nected over a communication network that cooperate to achieve common goals have shown an unprecedented growth in the last few decades, especially in the form of the Grid, the Cloud, mobile ad hoc networks, and wireless sensor networks. Design of algorithms for these systems, namely the distributed algorithms, has become an important research area of computer science, engineering, applied mathematics, and other disciplines as they pose different and usually more difficult problems than the sequential algorithms. A graph can be used to conveniently model a distributed sys- tem, and distributed graph algorithms or graph-theoretical distributed algorithms, in the context of this book, are considered as distributed algorithms that make use of some property of the graph that models the distributed system to solve a problem in such systems. This book is about distributed graph algorithms as applied to computer networks with focus on implementation and hopefully without much sacrifice on the theory. It grew out of the need I have witnessed while teaching distributed systems and algo- rithms courses in the last two decades or so. The main observation was that although there were many books on distributed algorithms, graph theory, and ad hoc networks separately, there did not seem to be any book with detailed focus on the intersection of these three major areas of research. The second observation was the difficulty the students faced when implementing distributed algorithm code although the con- cepts and the idea of an algorithm in an abstract manner were perceived relatively more comfortably. For example, when and how to synchronize algorithms running on different computing nodes was one of the main difficulties. In this sense, we have attempted to provide algorithms in ready-to-be-coded format in most cases, showing minor details explicitly to aid the distributed algorithm designer and implementor
Machine Learning for Hackers
What is machine learning? At the highest level of abstraction, we can think of machine learning as a set of tools and methods that attempt to infer patterns and extract insight from a record of the observable world. For example, if we are trying to teach a computer to recognize the zip codes written on the fronts of envelopes, our data may consist of photographs of the envelopes along with a record of the zip code that each envelope was addressed to. That is, within some context we can take a record of the actions of our subjects, learn from this record, and then create a model of these activities that will inform our understanding of this context going forward. In practice, this requires data, and in contemporary applications this often means a lot of data (perhaps several tera- bytes). Most machine learning techniques take the availability of such data as given, which means new opportunities for their application in light of the quantities of data that are produced as a product of running modern companies. What is a hacker? Far from the stylized depictions of nefarious teenagers or Gibsonian cyber-punks portrayed in pop culture, we believe a hacker is someone who likes to solve problems and experiment with new technologies. If you’ve ever sat down with the latest O’Reilly book on a new computer language and knuckled out code until you were well past “Hello, World,” then you’re a hacker. Or if you’ve dismantled a new gadget until you understood the entire machinery’s architecture, then we probably mean you, too. These pursuits are often undertaken for no other reason than to have gone through the process and gained some knowledge about the how and the why of an unknown technology.
Machine Learning
This book sets out to introduce people to important machine learning algorithms. Tools and applications using these algorithms are introduced to give the reader an idea of how they are used in practice today. A wide selection of machine learning books is available, which discuss the mathematics, but discuss little of how to program the algorithms. This book aims to be a bridge from algorithms presented in matrix form to an actual functioning program. With that in mind, please note that this book is heavy on code and light on mathematics. Audience What is all this machine learning stuff and who needs it? In a nutshell, machine learning is making sense of data. So if you have data you want to understand, this book is for you. If you want to get data and make sense of it, then this book is for you too. It helps if you are familiar with a few basic programming concepts, such as recursion and a few data structures, such as trees. It will also help if you have had an introduction to linear algebra and probability, although expertise in these fields is not necessary to benefit from this book. Lastly, the book uses Python, which has been called “executable pseudo code” in the past. It is assumed that you have a basic working knowledge of Python, but do not worry if you are not an expert in Python— it is not difficult to learn.
MapReduce Design Patterns
MapReduce is a computing paradigm for processing data that resides on hundreds of computers, which has been popularized recently by Google, Hadoop, and many others. The paradigm is extraordinarily powerful, but it does not provide a general solution to what many are calling “big data,” so while it works particularly well on some problems, some are more challenging. This book will teach you what problems are amenable to the MapReduce paradigm, as well as how to use it effectively. At first glance, many people do not realize that MapReduce is more of a framework than a tool. You have to fit your solution into the framework of map and reduce, which in some situations might be challenging. MapReduce is not a feature, but rather a con? straint. This makes problem solving easier and harder. It provides clear boundaries for what you can and cannot do, making the number of options you have to consider fewer than you may be used to. At the same time, figuring out how to solve a problem with con? straints requires cleverness and a change in thinking. Learning MapReduce is a lot like learning recursion for the first time: it is challenging to find the recursive solution to the problem, but when it comes to you, it is clear, concise, and elegant. In many situations you have to be conscious of system resources being used by the MapReduce job, especially inter-cluster network utilization. The tradeoff of being confined to the MapReduce framework is the ability to process your data with dis? tributed computing, without having to deal with concurrency, robustness, scale, and other common challenges. But with a unique system and a unique way of problem solving, come unique design patterns.
Mathematical Game Theory and Applications
This book offers a combined course of lectures on game theory which the author has delivered for several years in Russian and foreign universities. In addition to classical branches of game theory, our analysis covers modern branches left without consideration in most textbooks on the subject (negotiation models, potential games, parlor games, best choice games, and network games). The fundamentals of mathematical analysis, algebra, and probability theory are the necessary prerequisites for reading. The book can be useful for students specializing in applied mathematics and informatics, as well as economical cybernetics. Moreover, it attracts the mutual interest of mathematicians operating in the field of game theory and experts in the fields of economics, management science, and operations research. Each chapter concludes with a series of exercises intended for better understanding. Some exercises represent open problems for conducting independent investigations. As a matter of fact, stimulation of reader’s research is the main priority of the book. A comprehensive bibliography will guide the audience in an appropriate scientific direction.
Charts and Graphs for Microsoft Office Excel 2007
The goal of this book is to make you more efficient and effective in creating visual displays of information using Excel. In the early chapters of this book, you will learn how to use the new Excel 2007 charting interface. Chapters 3 through 6 walk you through all the built-in chart types and talk about when you can use each chart type. Chapter 7 discusses about creating unusual charts. Chapter 8 covers pivot charts, and Chapter 9 covers creating visual displays of information right in the worksheet. Chapter 10 covers mapping, and Chapter 11 covers the new SmartArt business graphics, as well as Excel 2007’s shape tools. The penultimate chapter presents macro tools you can use to automate the production of charts using Excel VBA. In Chapter 14, you will see several techniques that people may use to stretch the truth with charts. Finally, in Appendix A, I provide you with a list of resources to give you additional help with creating charts and graphs.
Excel 2007: Beyond the Manual
This book is aimed at spreadsheet users who already have some familiarity with pre- vious versions of Microsoft Excel and who want an overview of the modifications and new features being introduced with Microsoft Office 2007. The book is also intended to be a practical guide to anyone wishing to update their Excel skills and progress to the more advanced features of this essential spreadsheet application.
Excel 2007 Data Analysis For Dummies
This book isn’t meant to be read cover to cover like a Dan Brown page-turner. Rather, it’s organized into tiny, no-sweat descriptions of how to do the things that must be done. Hop around and read the chapters that interest you. If you’re the sort of person who, perhaps because of a compulsive bent, needs to read a book cover to cover, that’s fine. I recommend that you delve in to the chapters on inferential statistics, however, only if you’ve taken at least a couple of college-level statistics classes. But that caveat aside, feel free. After all, maybe Lost is a rerun tonight.
Django Design Patterns and Best Practices
Django is one of the most popular web frameworks in use today. It powers large websites, such as Pinterest, Instagram, Disqus, and NASA. With a few lines of code, you can rapidly build a functional and secure website that can scale to millions of users. This book attempts to share solutions to several common design problems faced by Django developers. Sometimes, there are several solutions but we often wonder whether there is a recommended approach. Experienced developers frequently use certain idioms while deliberately avoiding certain others. This book is a collection of such patterns and insights. It is organized into chapters each covering a key area of the framework, such as Models, or an aspect of web development, such as Debugging. The focus is on building clean, modular, and more maintainable code. Every attempt has been made to present up-to-date information and use the latest versions. Django 1.7 comes loaded with exciting new features, such as built-in schema migrations and app reloading. Python 3.4 is the bleeding edge of the language with several new modules, such as asyncio. Both, both of which have been used here. Superheroes are a constant theme throughout the book. Most of the code examples are about building SuperBook—a social network of superheroes. As a novel way to present the challenges of a web development project, an exciting fictional narrative has been woven into each chapter in the form of story boxes.
Lightweight Django
Why This Book? We wanted to write this book primarily because we love Django. The community is amazing, and there are so many resources to learn about Django and to develop appli? cations using it. However, we also felt like many of these resources, including the official Django documentation, put too much emphasis on the power of Django and not on its decoupled design. Django is a well-written framework, with numerous utilities for building web applications included. What we want this book to highlight is how you can break apart and potentially replace these components to pick and choose what best suits the application you want to build. Similarly, we wanted to break down the typical structure of Django projects and applications. Our goal is to get you to stop asking “how do I do X in Django?” and instead ask “does Django provide anything to help me do X, and if not, is something available in the community?”
Django programming
For some years, web development has evolved through frameworks. Web development has become more efficient and has improved in quality. Django is a very sophisticated and popular framework. A framework is a set of tools designed to facilitate and standardize development. It allows the developer to benefit from very practical tools to minimize the development time. However, developing with frameworks requires knowledge about the framework and its proper usage. This book uses a step-by-step pedagogy to help novice developers learn how to easily deal with the Django framework. The examples in this book explain the development of a simple web tool: a text-based task manager.
Django Web Development
Django, written in Python, is a web application framework designed to build complex web applications quickly without any hassle. It loosely follows the MVC pattern and adheres to the Don't Repeat ourself principle, which makes a database- driven application efficient and highly scalable, and is by far the most popular and mature Python web framework. This book is a manual that will help you build a simple yet an effective Django web application. It starts by introducing Django to you and teaches you how to set it up and code simple programs. You will then learn to build your first Twitter-like application. Later on, you will be introduced to hashtags, Ajax (to enhance the user interface), and tweets. You will then move on to create an administration interface, learn database connectivity, and use third-party libraries. Then, you will learn to debug and deploy Django projects and will also get a glimpse of Django with AngularJS and Elasticsearch. By the end of this book, you will be able to leverage the Django framework to develop a fully functional web application with minimal effort.
Usted puede contribuir con Libros UCLV, es importante para nosotros su aporte..
Contribuir