By Edward Capriolo, Dean Wampler
Need to maneuver a relational database software to Hadoop? This accomplished advisor introduces you to Apache Hive, Hadoop’s information warehouse infrastructure. You’ll fast how you can use Hive’s SQL dialect—HiveQL—to summarize, question, and research huge datasets saved in Hadoop’s dispensed filesystem.
This example-driven consultant exhibits you the way to establish and configure Hive on your setting, presents an in depth evaluate of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop surroundings. You’ll additionally locate real-world case stories that describe how businesses have used Hive to unravel certain difficulties concerning petabytes of data.
- Use Hive to create, regulate, and drop databases, tables, perspectives, services, and indexes
- Customize information codecs and garage concepts, from documents to exterior databases
- Load and extract info from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
- Gain top practices for growing consumer outlined services (UDFs)
- Learn Hive styles you can use and anti-patterns you might want to avoid
- Integrate Hive with different info processing programs
- Use garage handlers for NoSQL databases and different datastores
- Learn the professionals and cons of working Hive on Amazon’s Elastic MapReduce
Read Online or Download Programming Hive PDF
Best Sql books
Offering info for database directors and alertness builders, this publication is a step by step advisor to studying software improvement and approach administration with SQL, writing effective SQL code, and fine-tuning server functionality. The CD-ROM comprises an evaluate reproduction of SQL Server 7.
Write robust SQL statements and PL/SQL programsLearn to entry Oracle databases via SQL statements and build PL/SQL courses with tips from Oracle specialist, Jason fee. released via Oracle Press, Oracle Database 11g SQL explains easy methods to retrieve and adjust database info, use SQL Plus and SQL Developer, paintings with database items, write PL/SQL courses, and lots more and plenty extra.
The former version of this booklet demonstrated itself because the such a lot whole and comprehensible therapy of the SQL usual often on hand. Many alterations have happened within the SQL criteria global considering that variation was once released, despite the fact that: an immense new part, the Call-Level Interface (SQL/CLI), used to be extra in 1995.
The necessity to deal with more and more greater facts volumes is one issue using the adoption of a brand new classification of nonrelational “NoSQL” databases. Advocates of NoSQL databases declare they are often used to construct structures which are extra performant, scale greater, and are more straightforward to software. NoSQL Distilled is a concise yet thorough advent to this quickly rising know-how.
Additional resources for Programming Hive
Class); task. setReducerClass(Reduce. class); activity. setInputFormatClass(TextInputFormat. class); task. setOutputFormatClass(TextOutputFormat. class); FileInputFormat. addInputPath(job, new Path(args)); FileOutputFormat. setOutputPath(job, new Path(args)); activity. waitForCompletion(true); } } That used to be sixty three traces of Java code. We won’t clarify the API info.  here's an identical calculation written in HiveQL, that's simply eight traces of code, and doesn't require compilation nor the construction of a “JAR” (Java ARchive) dossier: CREATE desk doctors (line STRING); LOAD facts INPATH 'docs' OVERWRITE INTO desk medical doctors; CREATE desk word_counts AS choose notice, count(1) AS count number FROM (SELECT explode(split(line, '\s')) AS observe FROM doctors) w workforce through be aware ORDER by means of be aware; We’ll clarify all this HiveQL syntax in a while. In either examples, the records have been tokenized into phrases utilizing the easiest attainable method; splitting on whitespace obstacles. This procedure doesn’t safely deal with punctuation, it doesn’t realize that singular and plural different types of phrases are an analogous notice, and so forth. even though, it’s more than enough for our reasons the following.  The advantage of the Java API is the facility to customise and fine-tune each aspect of an set of rules implementation. besides the fact that, more often than not, you simply don’t desire that point of keep watch over and it slows you down significantly when you've got to regulate all these information. If you’re now not a programmer, then writing Java MapReduce code is out of succeed in. even though, should you already recognize SQL, studying Hive is comparatively ordinary and lots of purposes are fast and simple to enforce. What’s subsequent We defined the real position that Hive performs within the Hadoop surroundings. Now let’s start!  although, for the large info units Hive is designed for, this start-up overhead is trivial in comparison to the particular processing time.  See the Apache HBase site, http://hbase. apache. org, and HBase: The Definitive advisor by way of Lars George (O’Reilly).  See the Cassandra web site, http://cassandra. apache. org/, and excessive functionality Cassandra Cookbook through Edward Capriolo (Packt).  See the DynamoDB web site, http://aws. amazon. com/dynamodb/.  See https://cwiki. apache. org/Hive/.  It’s worthy bookmarking the wiki hyperlink, notwithstanding, as the wiki includes a few extra vague info we won’t disguise right here.  If you’re now not a developer, a “Hello international” application is the normal first software you write while studying a brand new language or instrument set.  Apache Hadoop observe count number: http://wiki. apache. org/hadoop/WordCount.  See Hadoop: The Definitive consultant by means of Tom White for the main points.  there's another minor distinction. The Hive question hardcodes a route to the knowledge, whereas the Java code takes the trail as a controversy. In bankruptcy 2, we’ll how one can use Hive variables in scripts to prevent hardcoding such information. bankruptcy 2. Getting begun Let’s set up Hadoop and Hive on our own computing device. it is a handy strategy to research and scan with Hadoop. Then we’ll speak about find out how to configure Hive to be used on Hadoop clusters.