Saltmätargatan 8A, 113 59 Stockholm

+46 (0) 8 410 55 700 info@middlecon.se

Netezzan får ge plats åt Sailfish

IBM Integrated Analytics System, som i dagligt tal fått namnet Sailfish, har äntligen lanserats. Nu finns en lösning som är starkare, bredare och mera användbar än Netezzan där molnet och on prem går hand i hand. Vi på Middlecon är eld och lågor över det här! Varför? Vi ser hur hela teknologin tar ett gigantiskt steg framåt. Men varför lyssna bara på mig?

Jag lämnar ordet till David Birmingham, som testat Sailfish, och hans artikel.

This is no ordinary release. IBM has integrated the Db2 BLU engine with the Netezza MPP, so now we can build and issue portable queries with a common, consistent experience. This is quite an achievement and many kudos to the engineers who have participated.

Power Frames vs Blade Servers

Under the hardware covers we now have Power machines. This matters for a lot of good reasons, the best one being horizontal elasticity. Most Enzees know whenever we want to upgrade our hardware to more power, say from one rack to two, we have to bring the two-rack into the shop, copy the data from the one-rack to the two-rack, and off-we-go.

With Sailfish, we simply add-a-frame, perform some simple configuration commands, the data rebalances to the new capacity, and off-we-go. Simple and painless. How cool is that?

In this hardware scenario, dataslices and zone maps (now synopsis tables) behave the same. Distribution and organization follow the same rules. Enzees will experience no changes in these core areas.

Columnar tables

Db2 BLU experienced this radical speed also, but with Netezza-oriented MPPs, it’s even more profound. Netezza tables are row-oriented, so the ”read” operation takes a page from disk into the FPGA, where the desired columns are stripped from rows, and undesired rows are filtered from the stream.

In Columnar mode, the pages don’t store rows, but columns. When we ask for certain columns, the other columns’ pages aren’t touched. So now we read even less of the disk itself, and even less data meets the CPU.

Solid-state drives

Oh, yeah, you heard that right. In current Netezza, an electro-mechanical drive will not only burn out, but even in operation, the read-head has to seek a page on the disk, fetch it, and multi-task with other read operations to optimize the physical read-head over the spinning disk. Even the spinning disk is optimized to carry the user data on the outer-third of the disk where it spins fastest. This can serialize queries at the read-head and is potential for concurrency bottlenecks.

With solid-state drives, the read-time is radically faster, and memory is fair-game by any process, without waiting. It’s solid-state so mechanical breakdown is not even on the radar. More importantly, it reads and writes data at phenomenal speed compared to a hard drive.

Where before, the disk read speed was the number-one drag on a query, now we will likely see this re-balanced to other areas of the machine. Don’t get me wrong, it still takes time to read memory, so we should not forego the use of zone maps – er – synopsis tables now – to reduce and filter the total amount we read. This is just good stewardship of the CPUs and other resources. Just because we ”can” read a lot of data faster, doesn’t mean we ”should” – we should filter and reduce the total data arriving into the CPU for the most efficient query utility.

Common SQL Engine

This means SQL statement portability and consistency across various platforms, and the libraries such as SQL Toolkit, Inza, Fluid Query etc are now baked-in to the engine and available at our fingertips when we power-up.

We’ll have more consistent experience, and less guesswork for functional/operational behaviors.

And we’ll experience seamless integration with the Data Science Experience (DSX), machine learning, and a bunch of other offerings IBM has already announced or is in the queue.

Ease of configuration

As we were moving through our POC, the IBM team assigned to us was amazingly helpful. They knew where all the hooks were to tweak this or tune that, because the common engine and underpinning metadata are pervasive and well-understood. Anytime we had a question or thought we’d encountered an issue, the solution was invariably a simple configuration change.

This speaks volumes for the level of effort, thought and insight poured into this release. IBM has thought-through the wide variety of priorities between these platforms and has provided the ones that matter most, seamlessly.

Expect to hear good things as this rollout proceeds.