Skip to content

Feed aggregator

Finding gaps in partitioned sequences

Magnus Hagander's PostgreSQL blog - Fri, 01/27/2012 - 17:53

There are an almost unlimited number of articles on the web about how to find gaps in sequences in SQL. And it doesn't have to be very hard. Doing it in a "partitioned sequence" makes it a bit harder, but still not very hard. But when I turned to a window aggregate to do that, I was immediately told "hey, that's a good example of a window aggregate to solve your daily chores, you should blog about that". So here we go - yet another example of finding a gap in a sequence using SQL.

I have a database that is very simply structured - it's got a primary key made out of (groupid, year, month, seq), all integers. On top of that it has a couple of largish text fields and an fti field for full text search. (Initiated people will know right away which database this is). The sequence in the seq column resets to zero for each combination of (groupid, year, month). And I wanted to find out where there were gaps in it, and how big they were, to debug the tool that wrote the data into the database. This is really easy with a window aggregate:


SELECT * FROM (
   SELECT
      gropid,
      year,
      month,
      seq,
      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
WHERE NOT (t.gap=1)
ORDER BY groupid, year, month, seq
 

One advantage to using a window aggregate for this is that we actually get the whole row back, and not just the primary key - so it's easy enough to include all the data you need to figure something out.

What about performance? I don't really have a big database to test this on, so I can't say for sure. It's going to be a sequential scan, since I look at the whole table,and not just parts of it. It takes about 4 seconds to run over a table of about a million rows, 2.7Gb, on a modest VM with no actual I/O capacity to speak of and a very limited amount of memory, returning about 100 rows. It's certainly by far fast enough for me in this case.

And as a bonus, it found me two bugs in the loading script and at least one bug in somebody elses code that I'm now waiting on to get fixed...

Categories: Blogs, Open Source

The 2011 Master the Mainframe Contest has come to a conclusion

(Posted January 27, 2012) And the results keep right on improving...  we have a very smart kids out there with good futures in the world of computers should they choose to go the direction.  To paraphrase Darth Vader (Star Wars): “The Mainframe is strong with this g...
Categories: Blogs, DB2

MapReduce With Hadoop: What Happens During Mapping

myNoSQL - Alex Popescu - Fri, 01/27/2012 - 15:33
MapReduce With Hadoop: What Happens During Mapping:

An interesting look at what happens during the map phase in Hadoop and the impact of emitting key-value pairs:

  • a direct negative impact on the map time and CPU usage, due to more serialization
  • an indirect negative impact on CPU due to more spilling and additional deserialization in the combine step
  • a direct impact on the map task, due to more intermediate files, which makes the final merge more expensive

Map Reduce Combine

The main point of the dynaTrace blog post is that even if Hadoop makes it easy to throw more hardware at a problem, wasting resources with bad code in MapReduce tasks comes with a noticeable and measurable cost.

Original title and link: MapReduce With Hadoop: What Happens During Mapping (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

Analysts' Predictions for Hadoop Market

myNoSQL - Alex Popescu - Fri, 01/27/2012 - 14:42
Analysts' Predictions for Hadoop Market:

With so many players in the market[1], it’s easy to see that not all of them will flourish. IDC has predicted that this year will see a lot of merger and acquisition activity as large technology companies rush to buy smaller companies with expertise in big data. By 2015, the analysts say it’s likely that none of the current “major players” in the Hadoop market will still exist.

These predictions have also a dark scary side. Not in the sense that existing companies that bring value to the market do not deserve good exits in the next 3-4 years. But most of the time, if not ignored, these statements will lead to an applification of BS and the creation of a ton of copy-cats bringing no value to a market that still has to see a lot of innovation, adoption, and return on investment for the users.

  1. According to Benjamin Woo, program vice president for worldwide storage systems at IDC, there are over 200 companies that claim to be in the big data space.  ↩

Original title and link: Analysts’ Predictions for Hadoop Market (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

41-54% DB2 CPU Reduction Achieved using zIIP

Triton Consulting - Thoughts on DB2 - Fri, 01/27/2012 - 12:49
In a current customer engagement we are enabling zIIP to reduce CPU consumption.  Our first overnight install has been a great success.  On the 1st mainframe LPAR (out of 4) we are seeing initial signs of 41-54% DB2 CPU reduction … Continue reading →
Categories: Companies, DB2

MoreSQL: No More NoSQL

myNoSQL - Alex Popescu - Fri, 01/27/2012 - 11:53
MoreSQL: No More NoSQL:

We at MoreSQL believe in the following axioms:

  1. Universal Applicability: there is no such thing as a problem which cannot be solved with relational databases. It doesn’t matter what you’re storing or how you need to use it. Tabular structures (which may or may not be linked via foreign keys) are the only way to go. End of discussion.

  2. Ends Justify Means: as corollary to axiom 1, we will do whatever it takes to make SQL work for us. Views, stored procedures, cross-database calls: you name it, we’ll do it. Oh and by the way, using ORMs does not mean that you’re trying to shove a round peg into a square hole. They are beautiful and enchanting, OK?

  3. Scale, shmale: relational databases can scale well enough. I mean, Facebook is running on MySQL, for crying out loud! Are you better than Facebook and its 10 trillion active users? I didn’t think so.

I’ve already tattooed myself with MoreSQL and I’m distributing printed leaflets with the axioms in all major squares in town.

Original title and link: MoreSQL: No More NoSQL (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

NoSQL Books: Riak Handbook and the Little Redis Book

myNoSQL - Alex Popescu - Fri, 01/27/2012 - 11:45

A couple of recent books that I’ll be adding to the list of NoSQL books:

  1. Mathias Meyer’s Riak Handbook. You can get an idea of the book by checking Consistent Hashing Explained: The What and the Why, the free sample chapter, and the table of contents.

  2. Karl Seguin’s The Little Redis Book. Karl is at the second free NoSQL book after the The Little MongoDB Book.

Original title and link: NoSQL Books: Riak Handbook and the Little Redis Book (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

Measuring User Retention With Hadoop and Hive

myNoSQL - Alex Popescu - Fri, 01/27/2012 - 11:23
Measuring User Retention With Hadoop and Hive:

A very practical example of how Hive and Hadoop could deliver value when applied to clickstreams, the most common data for each web property:

Hadoop, Hive, and related tech­nologies are formi­dable tools for unlocking value from data. […] Retention measure­ments are partic­u­larly signif­icant because they paint a detailed picture about the overall stick­iness of a product across the entire userbase.

The same clickstream data can be used to calculate visitors’ conversion with the Bayesian discriminant using Hadoop.

Original title and link: Measuring User Retention With Hadoop and Hive (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

Bug Hunt: What made this blog slow?

Ayende @ Rahien - Fri, 01/27/2012 - 11:00

A while ago the blog start taking 100% CPU on the client machines. Obviously we were doing something very wrong there, but what exactly was it?

We track down the problem to the following code, can you figure out what the problem?

image

image

Categories: Blogs

The History of NoSQL: This Was Not Our Technology Vendors' Fault

myNoSQL - Alex Popescu - Thu, 01/26/2012 - 22:14

Werner Vogels in the post about Amazon DynamoDB:

We had been pushing the scalability of commercially available technologies to their limits and finally reached a point where these third party technologies could no longer be used without significant risk. This was not our technology vendors’ fault; Amazon’s scaling needs were beyond the specs for their technologies and we were using them in ways that most of their customers were not. A number of outages at the height of the 2004 holiday shopping season can be traced back to scaling commercial technologies beyond their boundaries.

Here is what I wrote about the history behind NoSQL databases:

Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.

It feels rewarding to get such confirmation from people that are at the forefront of NoSQL.

Original title and link: The History of NoSQL: This Was Not Our Technology Vendors’ Fault (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

MySQL Configuration Wizard Updated

MySQL Performance Blog - Thu, 01/26/2012 - 20:52
We’ve released an updated version of the MySQL Configuration Wizard we announced at the end of last year. If you don’t remember that announcement, here’s the short version: this is a tool to help you generate my.cnf files based on your server’s hardware and other characteristics. We’ve gotten really good feedback on this tool, including [...]
Categories: Blogs, MySQL, Open Source

Join us March 7, 2012 for the Virtual Launch of SQL Server 2012!

On March 7, 2012 we are hosting the SQL Server 2012 Virtual Launch Event (VLE), to share the latest on SQL Server 2012 and the evolution of the Microsoft data platform. Through our VLE, anyone, anywhere in the world can simply log in and be a part of this amazing experience – consuming content at your own pace while still experiencing all the benefits of a tradeshow event.

What are some great reasons to check out our VLE experience?

You want to learn from SQL Server insiders
Learn more about the new features of SQL Server 2012 through access to more than 30 sessions. Our experts will demonstrate how your business can go further, forward, faster by capitalizing on mission critical capabilities, new features that drive true business insights and the most cloud-ready SQL Server ever.

You want to engage with Partners and Customers
Visit our Partner Pavilion to discuss how partner and pioneer customer solutions integrate with SQL Server 2012.

You want to chat live with product experts and MVPs
Chat live with product experts and MVPs to get the inside scoop. Our team will be on hand to answer questions about SQL Server 2012 and network in the virtual lounge.

You want to engage with the community – and maybe win a prize!
Participate in virtual launch activities like the keynote speech, technical demos and networking lounge, and collect points to earn cool prizes such as cash gift cards, SQL Server Gear, and Xbox systems. The more points you earn, the bigger your prize could be!

Register today at: www.sqlserverlaunch.com

Categories: Companies, SQL Server

The Co-operative Group saves millions by switching from Oracle to SQL Server

We talk a lot about the features and technical “how to” behind SQL Server, but our favorite topic is showing real-world examples of how it helps our customers achieve their goals and save millions of dollars. One great example of this is The Co-operative Group and their switch from Oracle to SQL Server.

The Co-operative Group operates 5,000 retail stores, and is one of the world’s largest member-owned businesses. The Group’s top strategic priority is expanding its membership base, and they set an aggressive goal to grow to 20 million members by 2020.

But instead of helping the Group achieve its goals, technology and licensing models were a barrier to success. The Group’s previous membership system was an Oracle solution, hosted by a provider that charged per member – a situation where scaling membership would have cost the company tens of millions of dollars. “It would have been financial suicide if we had tried to use the existing solution to accomplish our growth goals,” said Chris Sproston, the head of software development at The Co-operative Group.

To realize the company’s goals, The Co-operative Group turned to SQL Server 2008 R2 and a variety of other Microsoft products and services. Working with Microsoft partner HCL Infosystems, they developed a solution that stores account information, records transactional data and supports web-based self-service account management.

Using this new solution, The Co-operative Group improved member services with enterprise-wide reporting and analysis tools, and increased the security of customer information with highly specific access privileges and transparent data encryption. As a result, the company’s IT department can now spend time innovating and adding features, instead of using its resources to manage new reports or queries.

“Because SQL Server 2008 R2 is so scalable, we can expand from 6 million to 20 million members for 10 percent of what it would have cost on our old solution,” said Chris Sproston, head of software development for The Co-operative Group.

By implementing the Microsoft system, The Co-operative Group will be able to reduce the cost of its long-term growth goal by tens of millions of dollars. And that’s the kind of success story we love to be able to share.

For more information about The Co-operative Group’s solution and how they plan to save millions by switching from Oracle to SQL Server, check out the case study.

Categories: Companies, SQL Server

Join us March 7, 2012 for the Virtual Launch of SQL Server 2012!

On March 7, 2012 we are hosting the SQL Server 2012 Virtual Launch Event (VLE), to share the latest on SQL Server 2012 and the evolution of the Microsoft data platform. Through our VLE, anyone, anywhere in the world can simply log in and be a part of this amazing experience – consuming content at your own pace while still experiencing all the benefits of a tradeshow event.

What are some great reasons to check out our VLE experience?

You want to learn from SQL Server insiders
Learn more about the new features of SQL Server 2012 through access to more than 30 sessions. Our experts will demonstrate how your business can go further, forward, faster by capitalizing on mission critical capabilities, new features that drive true business insights and the most cloud-ready SQL Server ever.

You want to engage with Partners and Customers
Visit our Partner Pavilion to discuss how partner and pioneer customer solutions integrate with SQL Server 2012.

You want to chat live with product experts and MVPs
Chat live with product experts and MVPs to get the inside scoop. Our team will be on hand to answer questions about SQL Server 2012 and network in the virtual lounge.

You want to engage with the community – and maybe win a prize!
Participate in virtual launch activities like the keynote speech, technical demos and networking lounge, and collect points to earn cool prizes such as cash gift cards, SQL Server Gear, and Xbox systems. The more points you earn, the bigger your prize could be!

Register today at: www.sqlserverlaunch.com

Categories: Companies, SQL Server

Big Data Is More Than Hadoop

myNoSQL - Alex Popescu - Thu, 01/26/2012 - 14:40
Big Data Is More Than Hadoop:

David Menninger commenting the results of a Big Data survey run by Ventana Research:

This research shows that big data is not a single thing with one uniform set of requirements. Hadoop, a well-publicized technology for dealing with big data, gets a lot of attention (including from me), but there are other technologies being used to store and analyze big data.

Nobody said Hadoop is the only solution for Big Data. But Hadoop is a leading technology in the Big Data market.

One of the most interesting aspects of the survey is captured by the following:

Research participants cited real-time capabilities and integration as their key technical challenges.

Integration in the world of Big Data is like the old saying about successful web sites: “the more you send them away, the more they will come back”.

Update: Here is what Ventana Research was saying about Hadoop adoption in July 2011:

The research findings indicate that Hadoop is already being used in one third of big data environments and evaluated in nearly another fifth.

While in this one:

One-third (34%) are using data warehouse appliances, which typically combine relational database technology with massively parallel processing. About as many (33%) are using in-memory databases. Each of these alternatives is being more widely used than Hadoop. As well, 15% use specialized databases such as columnar technologies, and one-quarter (26%) are using other technologies.

Original title and link: Big Data Is More Than Hadoop (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

Mavuno: A Hadoop-Based Text Mining Toolkit

myNoSQL - Alex Popescu - Thu, 01/26/2012 - 14:11
Mavuno: A Hadoop-Based Text Mining Toolkit:

Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

I’d love to hear from people with more knowledge in the field how Mavuno compares to Mahout.

Ryan Rosario

Original title and link: Mavuno: A Hadoop-Based Text Mining Toolkit (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

We won the Rapidus award!

Neo4j Blog - Thu, 01/26/2012 - 13:48
I was running late - meeting across time zones is a hassle. Standing in the street I could hear the heavy rock music from the night club. Was this really the place for a big media event in Malmö? Stepping into the dark it felt totally right though. More than 150 people had dressed down to participate in the mingle and awards that night. Rock away! Rapidus is an online newsletter here in
Categories: Open Source

Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials

myNoSQL - Alex Popescu - Thu, 01/26/2012 - 13:12
Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials:

Adam Gray[1]:

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].

  1. Adam Gray is Product Manager on the Elastic MapReduce Team  ↩

  2. Complete in the sense of core building blocks.  ↩

Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (NoSQL database©myNoSQL)

Categories: Blogs, NoSQL

Northwind Starter Kit Review: Conclusion

Ayende @ Rahien - Thu, 01/26/2012 - 11:00

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

A while ago I said:

Seriously?!  22(!) projects to do a sample application using Northwind?

And people took me up to task about it. The criticism was mostly focused on two parts:

  • I didn’t get that the project wasn’t about Northwind, but about being a sample app for architectural design patterns.
  • I couldn’t actually decide that a project was bad simply by looking at the project structure and some minor code browsing.

I am sad to say that after taking a detailed look at the code, I am even more firmly back at my original conclusion.  I started to do a review of the UI code, but there really is no real need to do so.

The entire project, as I said in the beginning, is supposed to be a sample application for Northwind. Northwind is a CRUD application. Well, not exactly, it is supposed to be an example of an Online Store, which is something much bigger than just Northwind. But it isn’t.

Say what you will, the Northwind Starter Kit is a CRUD application. It does exactly that, and nothing else. It does so in an incredibly complicated fashion, mind, but that is what it does.

Well, it doesn’t do updates, or deletes, or creates. So it is just an R application (I certainly consider the codebase to be R rated, not for impressionable developers).

If you want to have a sample application to show off architectural ideas, make sure that the application can actually, you know, show them. The only thing that NSK does is loading stuff from the database, try as I might, I found no real piece of business logic, no any reason why it is so complicated.

So, to the guys who commented on that, it isn’t a good project. If you like it, I am happy for you, there are also people who loves this guy:

Personally, I would call pest control.

Categories: Blogs

Nested Tables 101

From An Expert’s Guide to Oracle Technology

 

A nested table is much like an associative array but you do not determine the index. The index grows by using the extend command and the index is always an incrementing integer value. You can use the DELETE attribute to delete individual elements so you will always want to

Categories: Blogs, Oracle