Skip to content

Blogs

Caching could be the last thing you want to do

MySQL Performance Blog - Sat, 07/24/2010 - 19:39
I recently had a run-in with very popular PHP ecommerce package which makes me want to voice a recurring mistake I see in how many web applications are architected. What is that mistake? The ecommerce package I was working with depended on caching.  Out of the box it couldn’t serve 10 pages/second unless I enabled some features [...]
Categories: Blogs, MySQL, Open Source

SQLRally Logo Contest

It Depends - Andy Warren - Fri, 07/23/2010 - 18:45

Kendal posted the announcement yesterday that we opened the design content on 99designs. I’ve used the site before and it’s a decent system, typically get a logo for around $200 – and you get to pick from a lot of options. I know most of us are data people, but probably more than a few of us know Photoshop or the equivalent, or someone that does, which means it could be someone from the SQL community that designs the next really big PASS logo. Why not give it a try?

Categories: Blogs, SQL Server

SQL Source Control Webinar

SQL Musings - Steve Jones - Fri, 07/23/2010 - 17:30

SQL_68x68_SrcCon One of the new products that Red Gate Software has released recently is SQL Source Control. I saw a demo of this product back in March as it was nearing completion, and I thought it was pretty cool. I have struggled with source control for years, and build a process that was mainly manual, and one that I’ve seen others use over the years.  It worked well, but it was prone to errors unless I was allowed to chastise developers on a regular basis for not adhering to the process.

friends_of_rg_logo Today I acted as emcee and moderator for a semi-private webinar that demo’d the product to the Friends of Red Gate, a group that is by invitation only, but lets some customers interact with Red Gate similar to how the MVP program grants some people access to Microsoft.

This was the first time I actually acted as a moderator, having been an attendee or presenter in the past. It went well, I think, though I definitely had a little dead air and struggled to keep up with questions and find a smooth way to interrupt the presenter. I would rather be the one presenting and let someone else handle the other work for sure.

I think we’ll be scheduling more of these on a regular basis for some Red Gate products to get the word out more and show how they can be used to solve problems. This is one of the few that I’d purchase myself if I was doing DBA work, so we’ll start here and see what happens.

I’m also hoping to do some other more regular webinars from SQLServerCentral, on more technical topics, across the next year.

Categories: Blogs, SQL Server

Syndication on SQLServerCentral

SQL Musings - Steve Jones - Fri, 07/23/2010 - 17:17

I have a few people ask about this recently, so I thought I’d blog about it, and maybe get an explanation out there.

When we started accepting bloggers on SQLServerCentral, it was a challenge to get people to move their blogs. We had some people that just started blogging with us, but others that were already blogging elsewhere. At first I wasn’t sure what to do, but when SQLServerPedia started syndicating bloggers, using feeds, I requested similar functionality for SQLServerCentral.

We built it, and while it’s a little crude, and we’ve modified things a few times, here’s how it works:

  • You send us a note with
    • Your SQLServerCentral registered email
    • A feed URL
  • You blog in English (sorry, English only)
  • We read the feed and publish your blog on SQLServerCentral.

The Scary DBA has syndicated his blog, so you can see how the different things look on our site v the source. Some things don’t come across cleanly, and I’m working on getting some of those things fixed.

Frequency

We used to read feeds once a day, now we do it once an hour. So if you blog about something, it ought to be on the SQLServerCentral site in under 2 hours. If you think there is an issue, please contact the webmaster.

Hints

I use Live Writer to actually publish this blog on Blogspot. It lets me test things, but also gives me a backup of my blog and control over what I write. I think it’s a good idea to own your content, as well as keep a backup of it on your own blog.

Live Writer is a great piece of software for drafting blogs, scheduling them, working offline, etc. I actually sync my Live Writer folders  across my desktop and laptops. That gives me some additional capabilities to draft stuff on one machine and finish it on another.

I highly recommend people tag stuff they want to syndicate with a specific tag and then send us the feed of that tag. That allows you to still blog about something that you want to publish, but not syndicate it if you don’t want to. I know some people do this to publish other stuff in other languages, or they mix their personal/career blog and don’t necessarily want to send everything to SSC.

Setup

If you want to get setup, send a note to the webmaster at SQLServerCentral.com and we’ll get you syndicated.

Categories: Blogs, SQL Server

Survey results around purchase and use of SSDs

Paul S. Randal on SQL Server - Fri, 07/23/2010 - 16:38

Back at the start of July I kicked off a survey around your plans for SSDs (see here) and now I present the results to you. There's not much to editorialize here, but the numbers are interesting to see.

 

The "other" answers were (verbatim):

  • 3 x 'have bought and am trying them out'
  • 3 x 'not sure if we need them or not'
  • 2 x 'all production servers are hosted'
  • 1 x 'bought them, tried them..not good enough yet for tempdb'
  • 1 x 'Have some, want more, could you really every have enough?'
  • 1 x 'We get every penny from or spinning media, and have no need for SSD'

The results reflect what I've been hearing when teaching classes and talking to customers/conference attendees over the last six months. People are becoming more interested in SSDs but there's still a lot of wariness about them and of course the whole money issue of being able to buy them. I'm also not surprised (given the general readership demographics of this blog) by the number of people who've analyzed their IOPS requirements and concluded that they don't need SSDs to accomplish that.

 

The "other" answers were (verbatim):

  • 3 x 'not in the budget'
  • 1 x 'I plan to buy expensive drives and throw them at you, paul! love, conor'
  • 1 x 'I'm going to do the same thing Conor will do. Denny'
  • 1 x 'OLAP Scale Out'
  • 1 x 'Use them as cache'
  • 1 x 'Using in an EMC V-MAX SAN to dynamically move high workloads to SSD temporarily'

Ahem - thanks Conor and Denny :-)

Another unsurprising set of results that reflects what I've been hearing. One number I'd be interested in drilling deeper into is answer #3 - are people putting/planning to put tempdb on SSDs because that's what they've heard is the best thing to do, or because tempdb truly is the largest I/O bottleneck that can benefit the most from SSDs? That's a set of experiments I'd like to try out with my Fusion-io drives.

The final "other" answer is also interesting - I was talking to a couple of folks from EMC in Ireland about the V-MAX when we were there earlier this month. Very cool idea to migrate data up and down a set of devices with varying latencies (at the block level, not the file level) - I'd like to see more on how the technology copes with one-off operations like consistency checks or backups - do those IOs affect which layer a block resides in?

Anyway, hope you find these results interesting.

Thanks to all those who responded!

Categories: Blogs, SQL Server

Book Review: Victorious (The Lost Fleet)

It Depends - Andy Warren - Fri, 07/23/2010 - 12:45

Victorious by Jack Campbell ($8 at Amazon) is the finale in the six part series about Black Jack Geary leading a space fleet back home from deep in enemy territory. I’ll try not to ruin it for you if you haven’t read them all yet!

The series is good, though I’ll admit to thinking it might have been done in five books or maybe even four. It’s naval battles set in space, with a lot of tactics in three dimensions. There is a bit of intrigue and politics, especially the latter in this last one. A big focus on honor and not descending to the level of opponents, stuff I largely agree with but is rarely so simple in real life, and the author manages to show some of that throughout the books.

The ending is reasonably happy, though parts of it feel rushed – a strange feeling given that overall the story was stretched too much. The villains end up feeling not quite as formidable as they seemed earlier on.

Still, I can easily recommend the series, and it’s always nice to know that when you read a book there are more to follow already done and waiting.

Categories: Blogs, SQL Server

Real world authorization implementation considerations

Ayende @ Rahien - Fri, 07/23/2010 - 09:14

Nitpicker corner: this post discusses authorization, which assumes that you already know who the user is. Discussion of authentication methods, how we decide who the user is, would be outside the scope of this post.

I had a lot of experience with building security systems. After all, sooner or later, whatever your project is, you are going to need one. At some point, I got tired enough of doing that that I wrote Rhino Security, which codify a lot of the lessons that I learned from all of those times. And I learned a lot from using Rhino Security in real world projects as well.

When coming to design the authorization bundle for RavenDB, I had decided to make a conscious effort to detail the underlying premise that I have when I am approaching the design of a security system.

You can’t manage authorization at the infrastructure level

That seems to be an instinctual response by most developers when faced with the problem, “we will push it to the infrastructure and handle this automatically”. The usual arguments is that we want to avoid the possibility of the developer forgetting to include the security checks and that it makes it easier to develop.

The problem is that when you put security decisions in the infrastructure, you are losing the context in which a certain operation is performed. And context matters. It matters even more when we consider the fact that there are actually two separate levels of security that we need to consider:

  • Infrastructure related – can I read / write to this document?
  • Business related – can I perform [business operation] on this entity?

Very often, we try to use the first to apply the second. This is often the can when we have a business rule that specify that a user shouldn’t be able to access certain documents which we try to apply at the infrastructure level.

For a change, we will use the example of a debt collection agency.

As a debt collector, I can negotiate a settlement plan with a debtor, so the agency can resolve the debt.

  • Debt collectors can only negotiate settlement plans for debts under 50,000$
  • Only managers can negotiate settlement plans for debts over 50,000$

Seems simple, right? We will assume that we have a solution in store and say that the role of DebtCollectors can’t read/write to documents about settlement plans of over 50K$. I am not sure how you would actually implement this, but let us say that we did just that. We solved the problem at the infrastructure level and everyone is happy.

Then we run into a problem, a Debt Collector may not be allow to do the actual negotiation with a heavy debtor, but there is a whole lot of additional work that goes on that the Debt Collector should do (check for collateral, future prospects, background check, etc).

The way that the agency works, the Debt Collector does a lot of the preliminary work, then the manager does the actual negotiation. That means that for the same entity, under different contexts, we have very different security rules. And these sort of requirements are the ones that are going to give you fits when you try to apply them at the infrastructure level.

You can argue that those sort of rules are business logic, not security rules, but the way the business think of them, that is exactly what they are.

The logged on user isn’t the actual user

There is another aspect for this. Usually when we need to implement security system like this, people throw into the ring the notion of Row Level Security and allowing access to specific rows by specific logins. That is a non starter from the get go, for several reasons. The previous point about infrastructure level security applies here as well, but the major problem is that it just doesn’t work when you have more than a pittance of users.

All Row Level Security solutions that I am aware of (I am thinking specifically of some solutions provided by database vendors) requires you to login into the database using a specific user, from which your credentials can be checked against specific rows permissions.

Consider the case where you have a large number of users, and you have to login to the database for each user using their credentials. What is going to be the affect on the system?

Well, there are going to be two major problems. The first is that you can wave goodbye to small & unimportant things like connection pooling, since each user have their own login, they can’t share connections, which is going to substantially increase the cost of talking to the database.

The second is a bit more complex to explain. When the system perform an operation as a result of a user action, there are distinct differences between work that the system performs on behalf of the user and work that the system performs on behalf of the system.

Let us go back to our Debt Collection Agency and look at an example:

As a Debt Collector, I can finalize a settlement plan with a debtor, so the agency can make a lot of money.

  • A Debt Collector may only view settlement plans for the vendors that they handle debt collection for.
  • Settlement plan cannot be finalized if (along with other plans that may exists) the settlement plan would result in over 70% of the debtor salary going into paying debts.

This is pretty simple scenario. If I am collecting debts for ACME, I can’t take a peek and see how debts handle be EMCA, ACME’s competitor, are handled. And naturally, if the debtor’s income isn’t sufficient to pay the debt, it is pretty obvious that the settlement plan isn’t valid, and we need to consider something else.

Now, let us look at how we would actually implement this, the first rule specifies that we can’t see other settlement plans, but for us to enforce the second rule, we must see them, even if they belong to other creditors. In other words, we have a rule where the system need to execute in the context of the system and not in the context of the user.

You will be surprised how often such scenarios come up when building complex systems. When your security system is relying on the logged on user for handling security filtering, you are going to run into a pretty hard problem when it comes the time to handle those scenarios.

Considerations

So, where does this leave us? It leave us with the following considerations when the time comes to build  an authorization implementation:

  • You can’t handle authorization in the infrastructure, there isn’t enough context to make decisions there.
  • Relying on the logged on user for row/document level security is a good way to have a wall hit your head in a considerable speed.
  • Authorization must be optional, because we need to execute some operations to ensure valid state outside the security context of a single user.
  • Authorization isn’t limited to the small set of operations that you can perform from infrastructure perspective (Read / Write) but have business meaning that you need to consider.

Categories: Blogs

Informix Editions revisited / Versões Informix revisitadas

Informix technology - Fernando Nunes - Thu, 07/22/2010 - 23:44
This article will be in English and Portuguese.
Este artigo será em Inglês e Português

English version:

After the announcements in May, new changes were made to Informix packaging. Basically the free versions for Windows and Mac OSX (Ultimate-C) were withdrawn. To replace them the Innovator-C version (also free, but with lower usage limits) will be available for all platforms (including the Windows and Mac OSX).

Also a new edition will be available, called Choice Edition, just for Windows and Mac OSX. This one will have license costs, but is cheaper than Growth Edition. The limits are higher than Innovator-C and lower than Growth Edition.

I believe these changes have created some confusion, but there are some important points to note:

  1. Clients will still be able to use a version without license costs for all platforms supported by Informix (Innovator-C). The usage limitations are fairly acceptable for a lot of usages
  2. Windows and Mac OSX are target platforms with a more aggressive price, considering Choice Edition (again, with fairly respectable functionality and usage limits)
  3. Existing customers are not necessarily affected by the new editions. They already payed for the licenses, and are (probably) paying maintenance/support. Obviously they can consider the new editions for new deployments
  4. Existing Enterprise Edition customers get more value, because Ultimate Edition includes more features than the Enterprise
The last changes were announced a few days ago and are explained in IIUG website, more specifically here:
http://www.iiug.org/news/announcements/new_prices.php

I delayed the publishing of this article because I was expecting the website comparing the different versions to be updated:

http://www.ibm.com/developerworks/data/library/techarticle/dm-0801doe/

This is the best, and the official place to check each version available functionality.

Other links:



VersĂŁo Portuguesa:

Depois dos anúncios em Maio, novas alterações foram feitas nas versões de Informix disponíveis. Basicamente as versões grátis para Windows e Mac OSX (Ultimate-C) foram descontinuadas. Para as substituir a versão Innovator-C (igualmente grátis, mas com limites de utilização mais baixos) estará disponível para todas as plataformas (incluindo Windows e Mac OSX)

Uma nova versão estará disponível, designada por Choice Edition, apenas para Windows e Mac OSX. Esta versão terá custos de licenciamento, mas será mais barata que a Growth Edition. Os limites situam-se entre a Innovator-C e a Growth Edition.

Acredito que estas alterações provoquem alguma confusão, mas há que salientar alguns pontos:

  1. Os clientes poderão utilizar uma versão sem custos em todas as plataformas suportadas pelo Informix (Innovator-C). As funcionalidades incluídas e os limites impostos não deverão ser obstáculo a um leque grande de ambientes.
  2. Windows e Mac OSX são plataformas alvo, com preços mais agressivos, considerando a versão Choice (também com funcionalidades e limites de utilização bastante respeitáveis.
  3. Os clientes existentes não são necessariamente afectados pelas novas versões. Na realidade o custo das licenças já foi pago, e o que estarão (provavelmente) a pagar são os custos de manutenção/suporte. Naturalmente poderão considerar as novas versões para novos projectos.
  4. Clientes actuais da Enterprise Edition obtém mais valor, porque a versão Ultimate incluí mais funcionalidade que a Enterprise
As últimas alterações foram anunciadas alguns dias atrás e estão explicadas no website do IIUG, mais especificamente aqui:

http://www.iiug.org/news/announcements/new_prices.php

Atrasei um pouco a publicação deste artigo porque estava à espera que o website que tem a comparação entre as versões fosse actualizado:

http://www.ibm.com/developerworks/data/library/techarticle/dm-0801doe/

Este Ă© o melhor sitio (e o oficial) para verificar as funcionalidades e limites de cada versĂŁo.

Outros links:
Categories: Blogs

RunAs Radio interview with Paul and Kimberly

Paul S. Randal on SQL Server - Thu, 07/22/2010 - 22:01

Last week we sat down with our good friends Richard Campbell and Greg Hughes of RunAs Radio and recorded an interview about how to learn effectively, plus some of the usual nonsense and insults.

Check it out at http://www.runasradio.com/default.aspx?showNum=169.

Enjoy!

Categories: Blogs, SQL Server

Oracle OpenWorld/JavaOne/Oracle Develop 2010

The OOW 2010 schedule builder is up and ready to run. It's pretty cool. You can do various searches for sessions, vendors, exhibits, etc. You can also find content by user tagging. Something I would like to get a look at is the recommendation engine that's using Oracle Data Mining.
Categories: Blogs, Oracle

PGDay.EU announced and call for papers

Magnus Hagander's PostgreSQL blog - Thu, 07/22/2010 - 18:57

PGDay.EU 2010 has finally been announced. It will be in Stuttgart, Germany, on December 6th to 8th. More details available on the conference website.

We have also sent out our call for papers. If you have done something interesting with PostgreSQL, please go ahead and submit a talk! We are currently looking for talks in both English and German!

Categories: Blogs, Open Source

CORRECTION: MySQL Idiosyncrasies that BITE Webinar

Ronald Bradford - MySQL Expert - Thu, 07/22/2010 - 17:00

If you have not looked at my recent presentation that I presented at ODTUG Kaleidoscope 2010, then feel free to join me on Tuesday, July 27, 2010 6:00 PM – 7:00 PM EDT tomorrow at 5pm EDT when I will giving a webinar on this talk. You can register online at https://www1.gotomeeting.com/register/730452824

Categories: Blogs, MySQL

Local Library Policies

It Depends - Andy Warren - Thu, 07/22/2010 - 12:32

Over the past couple years our local library system has been under financial pressure, resulting in some changes, and maybe not all of them bad, but still frustrating. Hoping those of you reading today might offer some insight into your local library policies so I can decide whether to dig deeper or just live with it!

  • All branches closed on Friday
  • Cannot checkout any books if any unpaid fine
  • Cannot renew a checked out book more than 4 times
  • Fines only paid in cash/check

The Friday closing I get, definitely savings to close the office, though I wish they would leave one branch open. More than once I’ve wanted to go on a Friday and couldn’t. They’ve told me that the reason behind requiring all fines to be paid is to increase circulation, but I find fault with that argument. You can return all your books, owe $.10, and not be able to check out any books. I rarely carry cash and there is no ATM machine in the building. I would have no problem paying the merchant fee on top of the fines. This is supposed to happen eventually, but it’s not been funded yet that I can see.

The limit of 4 renewals – don’t know. They only let you renew if no one has a hold on it, but I can see that there should be a max, whether 4 or 14. This one annoyed me recently when I returned books and forgot one that I’d had for a while (about growing tomatoes) and actually remembered to bring cash with me, but I couldn’t renew the book, couldn’t pay the fine on it because it wouldn’t calculate until returned – leaving me the option of paying for the book plus a $10 fee (which I considered), or having them hold all the books I’d selected on a Thursday evening that I wanted for a rare Friday at at the beach.

So, I’m trying not to rant, yet I’m not sure those policies are all that customer friendly.

Categories: Blogs, SQL Server

SQL Saturday Advice - Work with Sponsors

SQL Musings - Steve Jones - Thu, 07/22/2010 - 10:47

The hardest thing, I think, for most new organizers of a SQL Saturday is the fundraising. A few years ago I heard Jerry Jones, owner of the Dallas Cowboys, give his 5 rules for being a good salesman.

Number one is ask for the money. He couldn’t remember the other four :)

I think that’s a fine set of rules, but for many people, it’s really hard to ask for money. Surprisingly they are scared to do it. Maybe not surprisingly since I find it hard as well. It’s not though, as I’ve learned over the year, but I still feel trepidation that I’m asking for more than I’m giving at times.

I caught a great blog from one half of the Midnight DBA, Jen McCown (Blog|Twitter) on SQL Saturday sponsorships. It’s a great read, and there’s another good one from Ryan Adams. Sponsorship is important to a degree because there are some costs with a SQL Saturday. You might get free space, but often have to pay insurance. There’s lunch, though you can charge people. There’s signage, badges, other incidentals, but they still cost money and your significant other might not be thrilled with you paying out of pocket.

I don’t think it’s hard to get some sponsors, and I know that my company, Red Gate, is usually happy to help out some, PASS gives a little, a few other vendors will likely chip in. There are two keys that I’ve seen from talking with people.

  1. Communicate often and clearly
  2. Negotiate and be willing to take less (and give less).

You might think if you don’t get a response that your email was clear and that someone just doesn’t want to sponsor you. I will tell you that if you send 10, I bet that 7 of them aren’t clear to the receiver, or they aren’t read. 2 or 3 people might need to think about it and then will forget it. That means maybe one positive response from your email.

What do you do?

Send more emails. Send them repeatedly. Not botheringly-often like every day, but every week to ten days repeat them. Try new sponsors, try making a call or setting up a 10 minute meeting to talk about it. Use clear materials like Jen lists, and then be willing to work with them. Someone might not want to sponsor you for $1000, but they might be willing to do $200. If that’s the case, offer less. Be willing to shift slightly on your rate card, and negotiate with them. Ryan has good advice for being sure you deliver on your promises, keep them happy, and have them sponsor you again.

And have fun with it. It’s not the end of the world if someone walks away, and there are usually lots of sponsors to choose from. You don’t need a huge budget, and I’d plan for a small one. You can add things in easier than taking them away.

Categories: Blogs, SQL Server

BlitzDB Crash Safety and Auto Recovery

Toru Maesaka - Thu, 07/22/2010 - 10:43

Crash Safety is a big deal in the database league. Lack of durability can lead to all sorts of terrible things upon a catastrophic event. Many projects, especially in the so called NoSQL world compromises crash safety in return for higher QPS. The argument there is that the availability of the overall system should be accomplished by replication since a database server can’t be rescued if the physical disk breaks. I happen to agree with this philosophy but I am also aware that this isn’t a correct answer for everyone. So, what will I do with BlitzDB?

Several relational database hackers have pointed out that BlitzDB isn’t any safer than MyISAM since it doesn’t guarantee crash safety. This is currently true but I plan on making BlitzDB much safer than MyISAM by providing following features.

  1. Auto Recovery Routine (startup option)
  2. Tokyo Cabinet’s Transaction API (table-specific option)

The second feature above would actually guarantee BlitzDB to be crash safe (especially combined with auto recovery) but I won’t get into depth in this post since this topic deserves a blog post of it’s own. Let me just state that this feature will be provided in a form like this:

CREATE TABLE t1 (
  a int PRIMARY KEY,
  b varchar(256)
) ENGINE = BLITZDB, CRASH_SAFE;

From here on, I’ll cover how I plan on hacking auto recovery in BlitzDB.

Auto Recovery Challenges

As I blogged a while back, recovering Tokyo Cabinet is relatively simple. However, this is not a sufficient solution in BlitzDB since the data file (hash database that actually holds the rows) and the index file(s) are independent from each other. That is, the likelihood of the data file and the index file(s) to be inconsistent is very high after a crash. So, how can we hack on this? Pretty simple.

Indexes aren’t Important at Recovery Phase

Because BlitzDB logically separates the data file and it’s indexes, index files aren’t that important. If a server crash had occurred, BlitzDB could delete the index file(s) and recompute them from the data file. Needless to say, this process would involve a lot of random access and computation but it would not dominate the time space of the system since it’s a one-time cost. This approach however has one flaw in it such that the index files can’t be recomputed if the data file is broken or is unrecoverable.

Therefore to guarantee crash safety, BlitzDB must ensure that the data file is unbreakable. This is precisely where Tokyo Cabinet’s Transaction API comes in. I’m planning on using it to protect the data file from breaking. If the data file is protected, the table can be rescued. Simple!

So, that’s what I have in mind for making BlitzDB a safer engine. Unfortunately I can’t start hacking on it immediately since I have several bugs to fix first. Nevertheless I’m looking forward to start hacking on it. This challenge should be quite fun to tackle.

Categories: Blogs, MySQL

An interesting RavenDB bug

Ayende @ Rahien - Thu, 07/22/2010 - 07:37

I got a very strange bug report recently,

The following index:

from movie in docs.Movies
from actor in movie.Actors
select new { Actor = actor }

Will produce multiple results from a single document, which poses a pretty big problem when you try to page through that. Imagine that each movie has 10 actors, and you are trying to page through this index for the first two documents of movies by Charlie Chaplin. The first movie that matches Charlie Chaplin will have ten results returned from the index, and simple paging at the index level will give us the wrong results.

Here is my solution for that, which works, but make me just a tad uneasy:

public IEnumerable<IndexQueryResult> Query(IndexQuery indexQuery)
{
    IndexSearcher indexSearcher;
    using (searcher.Use(out indexSearcher))
    {
        var previousDocuments = new HashSet<string>();
        var luceneQuery = GetLuceneQuery(indexQuery);
        var start = indexQuery.Start;
        var pageSize = indexQuery.PageSize;
        var skippedDocs = 0;
        var returnedResults = 0;
        do
        {
            if(skippedDocs > 0)
            {
                start = start + pageSize;
                // trying to guesstimate how many results we will need to read from the index
                // to get enough unique documents to match the page size
                pageSize = skippedDocs * indexQuery.PageSize; 
                skippedDocs = 0;
            }
            var search = ExecuteQuery(indexSearcher, luceneQuery, start, pageSize, indexQuery.SortedFields);
            indexQuery.TotalSize.Value = search.totalHits;
            for (var i = start; i < search.totalHits && (i - start) < pageSize; i++)
            {
                var document = indexSearcher.Doc(search.scoreDocs[i].doc);
                if (IsDuplicateDocument(document, indexQuery.FieldsToFetch, previousDocuments))
                {
                    skippedDocs++;
                    continue;
                }
                returnedResults++;
                yield return RetrieveDocument(document, indexQuery.FieldsToFetch);
            }
        } while (skippedDocs > 0 && returnedResults < indexQuery.PageSize);
    }
}

Categories: Blogs

The IBM Mainframe has just raised the bar: Introducing the "IBM zEnterprise System"

No more subtle hints at the end of this blog. July 22, 2010 has arrived. With it, IBM's latest (and maybe greatest) mainframe has arrived also. The IBM zEnterprise System; their hybrid mainframe. With 96 5.2 GHz processors, up to 3 TB of memory... more to come...
Categories: Blogs, DB2

Instant File Initialization Speeds SQL Server

Aloha DBA - Brad M. McGehee - Thu, 07/22/2010 - 02:20

Sometimes, its just the smallest of details that can make all the difference. For example, on my test system (see the end of this posting for a description), I created a new 50GB database. The database creation process took about 5 minutes and 50 seconds to complete.

Next, I populated the database with over 61 million rows of data, which virtually occupied all of the available space within the newly created database. After that, I backed up the database using SSMS, and then I deleted the original database.

At this point, I restored the database from the backup using SSMS. Below, you see the typical Restore Database screen.

image 

At the bottom, right-hand side of the screen in the Progress box, notice the “Executing (0%)” indicator. Between the time I clicked the OK button to begin the restore, and when the “Executing (0%)” counter began to move, it took about 5 minutes and 50 seconds. At that point, the counter began to increment and the database was restored.

Now I make one very small change to my SQL Server instance (I’ll describe it in just a moment), and then I repeat the above steps (after deleting the database I just restored). First, I created a new 50GB database. This time, instead of taking 5 minutes and 50 seconds to create the database, it takes just under 2 seconds, a savings of about 5 minutes and 48 seconds. Next, I populated the database with the same amount of data as before, backed it up, and then deleted the original file. When I restored the database this time around, instead of having to wait 5 minutes and 50 seconds before the backup began to restore, I only had to wait just under 2 seconds. In both of these cases, I saved a significant amount of time.

So what was the very small change that I made, and why did it radically reduce the amount of time for database creation and database restoration to occur? I turned instant file initialization on.

What is Instant File Initialization?

In my first two examples, before instance file initialization was turned on, the reason it took so long for the database to be created, or the database to be restored (before a database can be restored, its space must first be pre-allocated, much like creating a new database), SQL Server had to go to every page in the 50 GB database and zero each one of them out. It can take a lot of time for SQL Server to go to every 8K page in a file (especially very large files) and physically zero out each page. When instant file initialization is turned on, SQL Server doesn’t have to zero out every 8K page that has been allocated. Instead, the space is just allocated to SQL Server by the operating system in one fell swoop, which is a very quick process, potentially saving you a great deal of time.

How Do You Turn Instant File Initialization On?

Unlike most configuration features in SQL Server, there is no on/off switch for instant file initialization. Instead, you have to assign a specific user right to the SQL Server Service (mssqlserver) account. Here’s what you need to do to turn on instant file initialization.

First of all, to use instant file initialization with SQL Server in a production environment, you must be using some combination of:

  • Windows Server 2003 or
  • Windows Server 2008 or
  • Windows Server 2008 R2

and using:

  • SQL Server 2005 (any edition) or
  • SQL Server 2008 (any edition) or
  • SQL Server 2008 R2 (any edition)

Second, you must assign the SQL Server Service (mssqlserver) a special user right called “Perform volume maintenance tasks”. To do this, start the Local Security Policy tool (you must be a local administrator to perform this task), then drill down to Security Settings | Local Policies | User Rights Assignment | Perform volume maintenance tasks, as you see in the screenshot below.

Once you have located “Perform volume maintenance tasks”, right-click on it and select “Properties”, and the “Perform volume maintenance tasks Properties” screen appears. Click on “Add User or Group” and then proceed through the remaining screens until you select the account that is being used as the service account for SQL Server. In the screen shot below, notice that I have added the BRADMCGEHEE\sqlserverservice account to this user rights assignment. This is the user account I use on my test server to run my SQL Server instance.

Once the SQL Server service account has been assigned this user right, you will have to restart the SQL Server service (of course, only when it is not being used), and from this point forward, instant file initialization is turned on for all MDF files in your SQL Server instance.

Note: If your SQL Server service account is a member of the local administrators group, then the account already has the “Perform volume maintenance tasks” user right and you don’t need to assign it again.

Why Isn’t Instant File Initialization Turned On by Default?

When a SQL Server instance is first installed, one of the things you must enter is a SQL Server service account. If you follow the best practice and select a domain user account to be used as the SQL Server service account, the setup process automatically assigns the domain user account with only just enough rights and permissions to run SQL Server. The “Perform volume maintenance tasks” user right is not automatically assigned during installation because it is not required to run SQL Server, and because allowing the service account to have this additional user right introduces a very small security risk.

Oh no, a security risk! Well, not really much of a security risk. Here’s the possible security risk scenario. The disk that is being used to create the new database on has been used for storing data that has been previously deleted. As you may know, when data is deleted from disk by the operating system, it really is not physically deleted; the space holding the data is just marked as being available. At some point, the older data will be overwritten with new data. This occurs all the time on millions of computers throughout the world every day. And as such, any data that has been marked for deletion, but not yet overwritten, is potentially available for access if you have the right tools and know what you are doing. In fact, undelete software uses this to recover data that has been accidently deleted.

When instant file initialization is not turned on, and when SQL Server allocates space for an MDF file, each of the pages allocated for the database is zeroed out, which removes the older data, in theory, preventing it from being accessed. I say “in theory” because there are computer forensics techniques that can even recover data that has been overwritten, but that discussion is really not applicable here.

So if instant file initialization is turned on, there is a very slight risk that someone could go to the pages allocated for the new database and read any older data that still may exist there. This is essentially a non-issue in virtually every organization, other than those that require very high security. But because of this potential security issue, instant file initialization is not turned on by default.

If instant file initialization is turned on, and pages are not zeroed out when the database is initially created, SQL Server will automatically overwrite any data that might have been on those pages when SQL Server needs that space.

When Is Instant File Initialization Used?

If instant file initialization is turned on, it is used in all of these cases:

  • When a database is first created
  • When a an existing database’s size is manually increased
  • When tempdb is recreated each time SQL Server is restarted
  • When autogrowth kicks in
  • When backups are restored (as the space has to be pre-allocated before a restore can occur) 

Instant file initialization only affects MDF and NDF files, not LDF files. In other words, transaction log files can’t take advantage of instant file initialization. This is because log files are circular in nature and must be zeroed out, as random data in transaction log pages can be problematic. In my earlier test, when I created a new 50 GB database, the MDF file was 50 GB and the log file was only 1 MB. If I had created a large log file (which is not uncommon), it would have taken awhile for the log to be created, although the MDF file would have been instantly created. This is also true when you manually increase the size of a log file, or when log file autogrowth occurs. In other words, don’t expect to have all of your databases (MDF and LDF files) created in less than 2 seconds like in my test. While the MDF will be created virtually instantly, the log file may take awhile to be created.

When I was working with SQL Server 2000 a few years back, which does not support instant file initialization, one of the things that annoyed me the most when restoring large databases was waiting for the database space to be allocated before the restore actually began. During emergency database restores, this wasted a lot of precious time, preventing me from getting the database back into production as fast as I would have preferred. If you aren’t using instant file initialization today, you are facing this same problem. That’s why I recommend all SQL Server 2005/2008 instances have instant file initialization turned on. The time saved when restoring databases is the best reason to use instant file initialization.

Check to See if Your SQL Server Instances Have Instant File Initialization Turned On

Hopefully, by now, you see the benefits of using instant file initialization. Assuming that you don’t already know if instant file initialization is turned on or off on the SQL Servers your manage, I challenge you to check and see, and if you find it turned off, turn it on and reap its many benefits.

 

Test Hardware

  • Dell T610 Tower, with a single, 6-core CPU (Intel Xeon X5670, 2.93 Ghz, 12M Cache, HT, 1333MHz FSB); 32GB 1333MHz RAM; a PERC H700 RAID controller; two 146GB 15K SAS Drives; one dual-port HBA (to connect to the DAS); and dual network connections. Hyper-threading turned off.
  • One PowerVault MD3000 DAS with two, dual-port controllers, and 15 146GB 15K SAS drives. MDF files located on RAID 10 array with 10 spindles, LDF files on RAID 10 array with 4 spindles, backup drive on a single spindle.

 

 

 

 

Categories: Blogs, SQL Server

3 webinars on Upgrading MySQL

Ronald Bradford - MySQL Expert - Wed, 07/21/2010 - 23:15

The IOUG Online Education Series: Get Real with Upgrades will include next week 3 different MySQL webinars. These are:

  • MySQL 5.1: Why and How to Upgrade by Sheeri Cabral on Tuesday, July 27, 12:00 p.m. – 1:00 p.m. CT
  • MySQL Upgrades With No Downtime by Sean Hull on Wednesday, July 28, 12:00 p.m. – 1:00 p.m. CT
  • MySQL Upgrade Best Practices by Matt Yonkovit on Thursday, July 29, 12:00 p.m. – 1:00 p.m. CT

Each speaker will be covering different areas and reasons for considering and implementing a successful MySQL Upgrade.

Categories: Blogs, MySQL

It’s the Platform

SQL Musings - Steve Jones - Wed, 07/21/2010 - 22:42

I keep seeing lots of debate about Android v the iPhone 4. I have friends talking about how cool Windows Mobile 7 will be, and even some talk about Nokia making a comeback. However it seems that so many people are just missing what’s cool about the iPhone.

It just works.

Not perfectly, not all the time, but it’s just so simple to make things happen. And if you want to try something else, there’s likely a dozen apps that do that one, quirky thing you want, and you can pick the one that works for you.

70618861[1] As an example. I was on the phone with a friend today and they recommended Career Warfare to me. He told me the author and while he spent 2-3 minutes explaining what he’d learned from it and why he’d recommend it, I surfed over to Amazon, found the book in a Kindle edition, selected my phone, and shot the book to it. A quick $12 purchase for me, some money for the author and Amazon, and befoer I got off the phone, I clicked the Kindle app and saw it download through wi-fi.

There’s nothing magic here, and Kindle is on Android now, but that platform, with things being that easy, is what makes the iPhone cool. Most apps are that easy to find, and I can find them in iTunes, see large screenshots and read reviews easily in a way that I couldn’t in the early Android Marketplace. Maybe it’s easier now, and maybe I can “send” things to my phone, and if so, they’re learning.

That cool, simple, seamless platform that just seems to work so often and intuitively, is the reason I love the iPhone.

Categories: Blogs, SQL Server