Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money
An interesting post on Teradata Aster blog which is indirectly emphasizing the weaknesses of the Hadoop platform:
- Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.
- Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.
- Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.
- Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business â and wonât be done.
Itâs difficult to argue against these points, but they are not insurmountable. Iâd even say that once the operational complexity of Hadoop deployments will get simplerâI think the Apache community, Cloudera, and Hortonworks are already working on these aspectsâ, Hadoop will see even more adoption and with that contributions addressing points 2 to 4 will follow shortly.
Yet another interesting part of the post is the two âequationsâ describing the two environments:
big clusters = big administration = big programs = big friction = low influence (Hadoop) big data = small clusters = easy administration = big analytics = big influence (ideal/Teradata Aster)
I think these are revealing how Teradata Aster is positioning their solutions and where they see themselves making money in the Big Data market. It goes like this: âwe can make a lot of money if we offer a platform with lower complexity and operational costs and higher productivity leading to better business resultsâ. This is a sound strategy and the competitors from the Hadoop space should better focus on these same aspects which are essential to wide adoption.
Original title and link: Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money (NoSQL database©myNoSQL)
The Fantastic 12 of 2012: Behind the Scenes of Managed Self-Service BI
Weâre back with a new episode of The Fantastic 12 of 2012: Behind the Scenes Blog Series, where weâre providing unique insights from the SQL Server Engineering Team as they developed SQL Server 2012. This week we are jumping ahead to Number 6 of The Fantastic 12 of SQL Server 2012 (PDF) and weâll revisit Number 5 in the weeks ahead. Be sure to catch the first four episodes here.
In this new episode John Hancock, Principal Program Manager, provides some interesting insights behind a major design decision around the new modeling capabilities including Key Performance Indicators, Hierarchies, and Perspectives and determining where those new capabilities ought to go into SQL Server 2012. Should those capabilities go into the professional environment or is there another approach? Find out how the team addressed and solved this challenge in the episode below!
Donât forget The Fantastic 12 of #SQL2012 Twitter Contest is happening every Thursday at 10:30am PT, where weâre giving away the brand new SQL Server T-Shirts selected by the SQL Family.
Fantastic 12 of SQL Server 2012
6 Managed Self-Service BI
Gain insight and oversight
- PowerPivot for SharePoint: Balance the need to monitor, manage, and govern the data and analytics end users create with IT dashboards and controls that help IT monitor end user activity, data source usage, and gather performance metrics from servers.
Enable IT Efficiency
- End user created, IT managed: SQL Server 2012 bridges the gap between end user created BI applications and IT managed corporate solutions by providing the ability to import PowerPivot models into Analysis Services so that they can be professionally managed and transformed into corporate grade solutions.
- Ease of administration through SharePoint: Enable end user alerting from reports published to SharePoint and benefit from the ease of consolidated management through the SharePoint 2010 Central Administration.
- SQL Azure Reporting: Extend rich user insights to even more people with SQL Azure Reporting that removes the need for deploying and maintaining a reporting infrastructure.
Big Data Episode 1: Overview for the Boss
The boss needs to present a big data strategy to the CEO. But what's it all about? And above all, what's the value to the business? In this video two team members give him an overview and will then get to work filling in the details.
May 23 Live Webcast: Oracle Database Appliance Best Practices
Simplify Database Management with Oracle Database Appliance Deployment Scenarios
Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited budgets.
By deploying Oracle Database Appliance, organizations can benefit from a reliable system that significantly reduces the time spent on routine system administration and maintenance, lowering operational costs, and allowing IT personnel to focus on higher value activities.
Using proven deployment best practices, midsize customers and enterprise departments alike can quickly integrate Oracle Database Appliance into their backup, test, development, and production environments. And since Oracle Database Appliance is based on IntelÂź XeonÂź processors, organizations can ensure a high level of performance and scalability.
Join Oracle Database Appliance experts Tammy Bednar, David Swanger, and Intel expert Fabrizio Giamello for this live Webcast and learn how to:
- Achieve a high quality of service at the lowest cost
- Reduce up-front investment in hardware and software
- Implement best practices across a multitude of deployment scenarios
Register today and get answers to your questions live from the experts.
Licensing: Could It Be Simpler?
In case you think licensing itâs easy, read this post by Alex Gorbachev explaining how remote mirroring, backup, and cold failover come with their own licensing implications1. My thoughts went from âitâs probably meâ, to âthis canât be trueâ, to ânot only will you need an army of people to setup things, but also an army to understand what you need to pay forâ.
By the way, I consider licensing as being an important part of the experience of a product. The more complicated it is, the less I feel like trying the product, even if feature-wise it comes close to my requirements.
-
The post refers to Oracle licensing, but Iâd venture to say that you could probably find the same simple licensing system in many other places. â©
Original title and link: Licensing: Could It Be Simpler? (NoSQL database©myNoSQL)
Iqbal Goralwalla Wows the Audience with DB2 9.7 Fix Pack âPearlsâ on DB2Night Showâą
When using the Task Parallel Library, Wait() is a BAD warning sign
Take a look at the following code:
public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IEnumerable<RdcNeed> needList)
{
return Task.Factory.StartNew(() =>
{
foreach (var item in needList)
{
switch (item.BlockType)
{
case RdcNeedType.Source:
source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
break;
case RdcNeedType.Seed:
seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
break;
default:
throw new NotSupportedException();
}
}
});
}
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
Do you see the problem in here?
It is a result of a code review comment about improper use of async in a project. This resulted in a lot of Task showing up in the return methods, but not in any measurable improvement in the actual codebase use of asynchronicity.
The problem is that when you need to work with such things in C# 4.0, you have to do some annoying things to get the code to work properly. In particular, this method was modified to be:
public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IList<RdcNeed> needList, int position = 0)
{
if(position>= needList.Count)
{
return new CompletedTask();
}
var item = needList[position];
Task task;
switch (item.BlockType)
{
case RdcNeedType.Source:
task = source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
break;
case RdcNeedType.Seed:
task = seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
break;
default:
throw new NotSupportedException();
}
return task.ContinueWith(resultTask =>
{
if (resultTask.Status == TaskStatus.Faulted)
resultTask.Wait(); // throws
return ParseAsync(source, seed, output, needList, position + 1);
}).Unwrap();
}
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
This code is more complex, but it is actually making proper use of the TPL. We have changed the loop into a recursive function, so we can take advantage of ContinueWith to the next iteration of the loop.
And no, I canât wait to get to C# 5.0 and have proper await work.
Optimize storage with deep compression in DB2 10
Get the most from the DB2 HADR standby database
Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction
Have you registered for Kscope12?
Join the Oracle Database Insider team at this year's ODTUG Kscope12 conference in San Antonio, Texas. June 24-28. Use discount code DBI (Database Insider) for $100 off!
ODTUG (Oracle Developer Tools User Group) holds their premier event for the Oracle Technical community annually at the Kscope event. ODTUG Kscope12 is the place to be for the Oracle technical community in 2012. If you are a developer, architect, technical lead, or database administrator who works with Application Express, Business Intelligence, Oracle EPM; including Hyperion products, Essbase, Planning; Database Development or Fusion Middleware, Kscope12 is where you should be. It's hard to find a conference that's big enough to attract world renowned speakers and small enough to get the chance to share knowledge. Kscope12 is that conference.
Oracle at Kscope12
Sessions/Speakers:
Oracle development
experts and Oracle ACE Directors are featured speakers during the
conference. In all, Oracle will participate in 54 sessions plus
symposium sessions, and hands on lab sessions.
Exhibitor Showcase:
Oracle will have a 10x20 booth in the exhibit center, encourage your
customers/partners to stop by and learn about Oracle in BI, database
development, and many other tools. Meet and greet with experts and learn
the latest news about Oracle technology and trends!
Networking Activities:
Something for everyone! Check out www.kscope12.com
â and look for a host of networking and learning activities.And, you
won't want to miss the special event planned for Wednesday, June 27 as
Kscope12 participants leave the high tech world and go to Knibbe Ranch
(pronounced ka-NIB-bee), an honest-to-goodness working ranch and the
site of Kscope12's special event on Wednesday night.
Community Service Day:
Calling all
volunteers!This year's Community Service Day will be dedicated to give
back by painting and landscaping a clubhouse of the Boys and Girls Club
of San Antonio. Please plan to arrive in time to leave at 8:00am on
Saturday morning, June 23.Pay it forward by delivering something back to
the community and have a great day with people from around the world!
Listen to the Podcast to learn more!
Get Ready for ODTUG's June 24-28 Kscope12!
ODTUG VP, Monty Latiolais gives an overview
of what to expect at this year's Oracle Development Tools User Group
Conference Kscope12 in San Antonio this June 24-28th.
Ruby Firebird Extension Library â Fb bumped to version 0.7.0
NO DB - the Center of Your Application Is Not the Database
Uncle Bob:
The center of your application is not the database. Nor is it one or more of the frameworks you may be using. The center of your application are the use cases of your application. [âŠ] If you get the database involved early, then it will warp your design. Itâll fight to gain control of the center, and once there it will hold onto the center like a scruffy terrier. You have to work hard to keep the database out of the center of your systems. You have to continuously say âNoâ to the temptation to get the database working early.
Original title and link: NO DB - the Center of Your Application Is Not the Database (NoSQL database©myNoSQL)
Training in London next week
Benchmarking single-row insert performance on Amazon EC2
The Grand Picture of Big Data and the Impact on the Architecture of Systems
In a recent interview for AllThingsD, Mike Rhodin, the senior vice president of IBMâs Software Solutions Group gave a very realistic description of what the future of data looks like:
[âŠ] it comes out of the digitization of the physical world, the instrumentation of physical processes thatâs going to generate huge amounts of new data, which is going to drive issues around storage, and what to do with all the data, how to analyze it. That pushes you toward real-time analytics and streaming technologies, because with real time, you donât have to save the data â you want to look for anomalies as they occur.
This is indeed the grand picture of Big Data.
Now think for a second how many companies have such systems in place. Not many. Think now how many companies can offer as-complete-as-possible integrated systems to address these challenges. Very few.
These two answers are revealing an interesting perspective about the future of the Big Data market.
On one side we have vendors building top notch solutionsâconsider the new features in the relational databases, NoSQL databases, Hadoop, etc. By looking at this space youâll have to agree that all these are excellent solutions for tackling a sub-space of the overall problem. They are getting closer and closer to offering local optimum solutions.
On other side there are the system integrators and platform vendors. Their systems may not be the best in solving every aspect of a problem, but their focus is in addressing and solving the complete problem. Their sales pitch is integration and/or specialization.
As someone writing about polyglot persistence and the 1001 NoSQL, NewSQL, and the development of the relational databases, I could be tempted to think that every company would have the budget, the know-how, and the time to take top-notch sub-systems and create solutions crafted to their problem. But looking back in time and also applying the lessons from other markets, I think it is safe to say that integrated solutions are preferred.
The lesson to be learned by both NoSQL and relational database vendors, actually by all (sub)system vendors that are playing in the Big Data market is to design products with openness and integration in mind. Very few, if any, sub-systems will be part of the grand solution if they are architected as silos. They can continue to provide the ultimate local optimum solutions, but as long as they are not architected to be part of a collaborative integrated platform theyâll lose important segments of the market. Many products Iâm writing about are already following this principle, many are making steps towards being friendlier in terms of integration, and many are still taking the silver bullet approach.
Original title and link: The Grand Picture of Big Data and the Impact on the Architecture of Systems (NoSQL database©myNoSQL)
What Big Data Is Used for at Facebook
Just a couple of examples: product and brand engagement, advertising.
A recent study we just published in the Proceedings of the National Academy of Sciences tells a new story about the way people adopt products and engage with them. The prevailing theories about this process suggested that what influences a person [to] adopt technologies is the number or percentage of friends who have already adopted the same technology, along with a personâs threshold for adopting such technologies. Our study shows that itâs less about the number of your friends who are using the technology, but more about their diversity. [âŠ] Some of the work weâre interested in understanding is how your friends influence your decisions to engage with advertising and brands.
Original title and link: What Big Data Is Used for at Facebook (NoSQL database©myNoSQL)
Cassandra at Workware Systems: Data Model FTW
One of the stories in which the deciding factor for using Cassandra was primarily the data model and not its scalability characteristics:
We started working with relational databases, and began building things primarily with PostgreSQL at first. But dealing with the kind of data that we do, the data model just wasnât appropriate. We started with Cassandra in the beginning to solve one problem: we needed to persist large vector data that was updated frequently from many different sources. RDBMSâs just donât do that very well, and the performance is really terrible for fast read operations. By contrast, Cassandra stores that type of data exceptionally well and the performance is fantastic. We went on from there and just decided to store everything in Cassandra.
Original title and link: Cassandra at Workware Systems: Data Model FTW (NoSQL database©myNoSQL)