Preliminary Program for the Mark Logic 2010 User Conference
It’s 60-something days until the Mark Logic User 2010 Conference which is being held at the lovely Intercontinental Hotel in San Francisco on 5/4/10 to 5/6/10 with pre-conference training available on 5/3/10. The early-bird $695 registration rate has been extended till 3/10/10 so you still have one week to register at this discounted rate. (Note that the early-bird government rate is a real steal at $495.) So please register now.
In addition, please note that this year we have added (1) the ability to pay the registration fee using training credits and (2) an integrated package that includes the pre-conference training with the program registration which might help people use training budget to attend this educational event. Contact Kelly Stirman, director of Mark Logic University, at kelly-dot-stirman at marklogic-dot-com if you’re interested.
In order to help you decide to attend, I’ve embedded a video overview of the 2009 event and the 2010 preliminary program below.
Here’s the video:
And here’s the preliminary conference program:
MarkLogic UC10 Conference Agenda Sessions
And did I mention that this year we will have our first user conference party!
Building Task-Aware Mobile Applications
As mentioned previously, Mark Logic participated in this week’s O’Reilly Tools of Change for Publishing conference in New York City. Below please find the presentation made by Mark Logic principal technologist Fernando Mesa on building task-aware information applications on mobile devices.
Building Task-Aware Mobile Applications View more presentations from Dave Kellogg.The Database Tea Party: The NoSQL Movement
Adam Smith’s invisible hand never rests. Just five years ago, the database market looked like a static, three-player $10B/year oligopoly where the primary forces were inertia and profit-taking. Today, we have two major forces disrupting the comfortable stasis that has developed over the past 30 years.
- One force is DBMS specialization: while the general-purpose RDBMS is useful for a broad range of applications, it is optimal for few of them. The RDBMS has slowly become expensive bloatware that is functionally a jack of all trades, master of none. MIT’s Michael Stonebraker calls the RDBMS a one size fits all solution.
- The other force is NoSQL, an organic and rapidly-growing industry movement away from relational databases, driven by a number of factors including both technology and cost.
The purpose of this post is to share my thoughts on NoSQL. Make no mistake, like the Tea Party Movement, NoSQL is a rebellion; just look at the name. But like most demonstrations, not everyone is marching for the same reasons. Here are some of the things I think various members of the NoSQL crowd are marching against:
- Table-oriented, 1960s-era database technology:Â RDBMSs were designed for handling data and short-text fields, necessitate mapping programmatic objects to tables (i.e., the impedance mismatch), and require the use of an increasingly stone-age query language, SQL.
- Scalability: relational databases were not designed to handle and do not generally cope well with Internet-scale, “big data” applications. Most of the big Internet companies (e.g., Google, Yahoo, Facebook) do not rely on RDBMS technology for this reason.
- High prices and the heavy-handed treatment of customers:Â both stem from the underlying oligopoly and the lack of credible alternative suppliers
- Closed source:Â the inability to customize the internals of the DBMS engine to meet specific needs
- Bloatware: ironically that while RDBMSs are perceived as light in requirements that matter (e.g., scalability), they are also seen as over-engineered for features that don’t. (ACID transactions are a favorite target in this department.)
- DBA supremacy. For years, corporate DBAs called the shots on where strategic data assets would be stored, and thus how they would be accessed. This created headaches for the programmers of the world who, in response, have done as much as possible to abstract away the database (e.g., Ruby on Rails).
On the flip side, there are things the NoSQL crowd are fighting for:
- Open source, implying control. The ability that open source software provides to customize product functionality.
- Open source, implying free. The often-flawed notion that the absence of software license fees results in a reduced lifetime cost of ownership.
- Coolness, or the “I want to be like Google” effect. If Google’s got BigTable, Yahoo’s got Hadoop, and Facebook’s got Cassandra, then we should build our own, too. Our app is hard; we’re smart guys, too.
- Vengeance, or the “I’m so mad at Oracle that I’ll do anything” effect. Yes, some folks are just plain mad enough at Oracle to either go write their own DBMS, or take on the support of a very low-level infrastructure technology.
So, if you’re considering a NoSQL solution — a class in which I include MarkLogic — you need to figure out what you’re marching against, what you’re fighting for, and ultimately what will meet your needs at the lowest total cost of ownership.
My first recommendation to detect and, where applicable, kill off the coolness effect. Google is swimming in money and PhDs. They can build anything they want regardless of whether they should and, right or wrong, for Google it just doesn’t matter. So unless you have Google’s business model and talent pool, you probably shouldn’t copy their development tendencies.
Heck, I get the coolness attraction. I think infrastructure software is cool, too. That’s why I was an OS geek early on and have spent my career around databases. But I surely don’t think that F1000 companies and government agencies should build their own DBMSs, nor fall into the trap of thinking that open source low-level stores are a free and easy way to avoid Oracle license fees. Cool shouldn’t be in the equation. Technology suitability and total cost should be. Period.
My second recommendation is to orthogonalize the open source question, making it independent of functional requirements. (This breaks if source customization is a requirement, but remember that requirement is often fictional: most open source users don’t customize.) If you’re struggling with an RDBMS on a given application problem you shouldn’t say: we need an open source, NoSQL type thing. You should say: we need to look at relational database alternatives. Those alternatives include a open source database projects (e.g., MongoDB, CouchDB) and distributed computing frameworks (e.g., Hadoop), but they also include commercial software offerings such as specialized DBMSs like Streambase (for real-time streams), Aster (for analytics on big data), and MarkLogic (for semi-structured data). Don’t throw out the commercial-software-benefits baby with the RDBMS bathwater.
My personal take on this issue is that:
- Relational databases, like the mainframe in 1985, are entering the Autumn of their lives. They won’t die quickly and mainframe isn’t dead today, but their best days are behind them.
- Our kids will see SQL the way we see COBOL. Some people can’t stand when I say this, but I think they’re in denial. There is no logical reason to assume that the relational database and the SQL language are the endpoints in database evolution. Yes, Larry Ellison is powerful. But Adam Smith is more so.
- Our kids will see no data/document dichotomy. They will just see digital information. We need to understand and remember that the data/document dichotomy is an artifact of the limitations of the tools and technologies with which we grew up.
- Some of the NoSQL hype is an over-reaction to the database oligopoly. I believe there are organizations out there who should be using alternative commercial databases, but instead are using open source NoSQL-type projects due to coolness, anger, or a mistaken belief that open source always has a lower total cost of ownership. I believe rationality will return to these people. One day management will say: “Holy cow! Why in the world are we paying programmers to write and support software at this low a level?” (This is potentially avoidable if you can mentally project yourself into the future now and imagine how you will look back at the coming three years.)
- Some of the NoSQL hype is a valid reaction to the technological limits of relational databases and the impedance mismatch in programming on them.
In the end, I think it’s great that the NoSQL movement is happening. It’s awakening people to traditional RDBMS alternatives. It’s making people understand that they don’t have to write big checks for commodity software. It’s helping people solve problems that they can’t solve, or solve efficiently, on relational technology.
My axe to grind is simple: just because you’re throwing out Oracle, don’t throw out all DBMSs and all commercial software with it. Take a breath. Look at all your alternatives. Study total costs and technology applicability. And make your best decision.
Interesting Writings on NoSQL
- Wikipedia NoSQL entry
- The NoSQL Discussion Has Nothing to do with SQL by Michael Stonebraker
- The Legit Part of the NoSQL Idea by Curt Monash
- The MyNoSQL Blog by Alex Popescu
- Jason Hunter’s presentation on MarkLogic Server to NoSQL Oakland
- Announcing the Release of HadoopDB by Daniel Abadi
- Seeking a Database that Doesn’t Suck on Ambient Irony
- Twitter Growth Prompts Switch from MySQL to NoSQL Database by Eric Lai (Computerworld)
- No to SQL:Â Anti-Database Movement Gains Steam by Eric Lai
Mark Logic at O’Reilly Tools of Change 2010: Free Camera Anyone?
While I’m not able to attend myself this year, I wanted to do a quick write-up on Mark Logic’s presence at the sold-out O’Reilly Tools of Change for Publishing show this week in New York City at the Marriott Marquis.
Mark Logic is a gold sponsor this year, which sounds pretty good, though in the ever-inflating world of sponsorships that puts us behind the “premier diamond,” “premier platinum,” “platinum,” and “premier gold” levels. (This reminds me of olive sizing where “extra large” is actually the fourth largest level after super collosal, collosal, and jumbo.)
We are proudly located in booth #2 and if you stop by, be sure to enter our drawing to win a free Kodak Zi8 High-definition pocket camera. Hopefully, you’ll come by to talk about Mark Logic’s strong presence in media and the kinds of new information products that we help companies build atop our powerful XML Server. Or, better yet, to ask detailed questions about the algorithms within MarkLogic Server that enable it to process complex queries against vast 100+ TB repositories of semi-structured information. But we’ll understand if you just want to come by to enter the drawing.
Mark Logic’s own Fernando Mesa is speaking at 9:20 AM on Wednesday 2/24 in the Wilder room to present The Mobile Opportunity: Developing a New Generation of Personalized, Task-Aware Applications for Mobile Devices. Fernando’s a great speaker and I’ve already seen him give this talk at one of our internal events. Like Colin Crawford, I think that mobile is a huge, second-chance opportunity for the media business, a chance to start over and get things right, setting the expectation that content isn’t free. So I believe that all media businesses should be looking hard at mobile and figuring out ways to make money there. While this isn’t the precise topic of Fernando’s talk, here’s a bit of his abstract:
Publishers are continually challenged to find ways to differentiate their content products in the mobile space. Discover how you can increase the value of your information using a new breed of technology Infrastructure that provides the tools for publishers to quickly build innovative mobile applications using location-based services, Information discovery, and context-aware delivery of content. We’ll review some of the challenges Publishers in supporting a large number of devices and eReader formats (with more to come) and the importance of having a flexible platform that can quickly adapt to device and format changes. Finally, we’ll suggest ways to maximize your product reach across mobile devices by leveraging open standards and toolsets.
Finally, we co-hosted a round table on cloud computing last night at 7:00 PM, entitled MarkLogic Server in the Cloud to Integrate the Content Supply Chain. If you’re more interested in that topic, be sure to visit Mark Logic’s cloud computing center.
Top Ten Reasons To Attend the Mark Logic User Conference on May 4-6, 2010
Following is a little bit of marketing for our upcoming user conference. This is a truly great event and I encourage all Mark Logic customers, prospective customers, and partners to attend. The only way to truly “feel the love” is to attend this event!
Following in the format of a Letterman top 10 list, our marketing team has created the following:
10. Hear tales of massive scale, incredible performance, and unparalleled agility from our award-winning customers and partners.
9. Network 1-on-1 with the Mark Logic product team and ask those burning questions only they can answer.
8. Listen to the latest observations and adventures of Chris Anderson, editor-in-chief of Wired magazine and best-selling author of The Long Tail and Free.
7. Get the straight scoop on new capabilities coming in future releases of MarkLogic Server and MarkLogic Application Services.
6. Learn technical advice, tips, and best practices from our own developers and professional services team.
5. Eat, drink and be merry at the California Academy of Sciences, a world-famous facility that combines an aquarium, planetarium, natural history museum and rainforest. (Yes, this year, we’re finally having a party!)
4. Find out what’s on the mind of our illustrious founder, Christopher Lindblad, in our annual fireside chat. (This alone is worth the ticket. You can never predict what Chris will say!)
3. Meet our partners and learn how they can help you get the most from MarkLogic Server.
2. Participate in a rousing DemoJam competition — a company tradition we’re extending to the global Mark Logic community!
1. What more do you need?!?! Register today and get the $695 early bird rate.
Open Text Snags Nstein
Open Text Corp. today announced that it was acquiring Montreal-based text mining and publishing solutions vendor Nstein Technologies for CDN $0.65 per share, or CDN $35M, equivalent to US $33.5M, a 100% premium over the trailing 30-day average closing price of Nstein’s common shares which are traded publicly on the Toronto Stock Venture Exchange (TSXV).
In its most recently reported financial period, 3Q09, Nstein reported (all figures CDN) $4.6M in revenues, and -$0.8M in EBITDA. Revenue was down 24% on a sequential basis and 17% on a year-over-year basis. Given the $18.4M run-rate and the $24.2M in TTM revenues, Open Text paid 1.9x run-rate and 1.4x TTM revenues for the small, largely text-mining focused concern. While the 100% premium is surely good news for shareholders, it’s off a valuation that is less than 1x TTM revenues (0.72x to be precise). Then again, the company was both losing money and shrinking.
I’ve charted 11 quarters of Nstein history above, which makes the picture pretty clear. Even the 2/08 acquisition of Picdar couldn’t get growth going, organic or otherwise.
In terms of focus, Nstein’s roots were in text mining. The Eurocortex acquisition brought them a poor man’s CMS, with Nstein paying less for a company than large Documentum customers pay for a license. Picdar brought them digital asset management. So you had a company doing $4.6M a quarter split across three areas: text mining, CMS, and DAM. Given the abnormally low 52% gross margins, that means a whole lot of that revenue was services, so they were maybe doing $2M a quarter in license. That’s $0.7M in license for each of the three areas, which basically rounds down to nothing. Remember the expression: if you try to be all things to all people you can end up nothing to everyone. This appears to be yet another example.
To my knowledge, this focus splitting was done in the name of “solutions” though what the company was known for — to the extent it was known at all — was text mining. I’ve previously blogged on such solutions strategies, and Nstein’s in particular: NStein 2Q08, Growth Slows: The Moldy Sandwich.
The tension highlighted in the “moldy sandwich” argument is that between creating a truly best-of-breed component (e.g., a sentiment analysis engine) and offering customers complete solutions to problems. Companies are invariably pulled by their salesforces to the latter, while most companies can only credibly offer the former. Simply put: do you want to offer your customers either great ham, great cheese, or great mayo — and ask them to build the sandwich — or do you want to offer them a complete sandwich, but made from bad ingredients? For most technology companies, I’d say you’re kidding yourself if you can think you can do both.
While I’ve never been a fan of the moldy sandwich strategy, I both know and like several of the folks at Nstein, and want to offer my congratulations to them on this deal. While I’m guessing the CMS will go away and the DAM customers will be moved to Artesia, I’m reasonably sure that they have found a nice home for the text mining engine and gotten a reasonable valuation for the firm (given its trajectory) and a nice pop for shareholders in the process.
Other coverage of the deal:
- CMSWire:Â Open Text to Cash Out CDN $35M for Nstein
- Beyond Search:Â Autonomy and Open Text:Â Which Strategy has Stronger Legs
PR Lessons from Sports This Week: Tiger F, Lysacek A+
What a great week for learning public relations (PR) from sports figures. First, we have (yet another) figure skating controversy with Evgeni Plushenko earning “only” silver despite having done quadruple jumps which the gold medalist, Evan Lysacek, did not. Then, we have the Tiger Woods confession — after 3 months of silence — for his extramarital affairs.
I’m not judging morally or technically: I blog about business, I know little about golf and even less about figure skating. I am, however, judging PR strategy and skills in handling these situations. In my estimation, Tiger gets an F and Lysacek gets an A+.
Why?
Lysacek did a simply amazing job in last night’s interview with NBC’s Bob Costas. Either Lysacek is the best PR “natural” I have ever seen, or he has simply world-class PR advisors. Despite Costas repeatedly baiting him, Lysacek looked a home-run hitter at batting practice, swatting away the inflammatory questions.
Excerpt (after having just shown a video of Plushenko saying that he thought he merited the gold):
Costas:Â “Plushenko said:Â ‘if the Olympic champion doesn’t know how to do quadruple jump, … now it’s not men’s figure skating, it’s dancing, … you can’t be considered a true men’s champion without the quad.’”
Lysacek:Â “well, I think no one likes to lose, and a lot of what he’s saying is probably coming from a little bit of disappointment and anger so, taking it out of context, I don’t think, for me, I can’t be emotional or react to it …”
That is simply a superb answer.  He gets the real issue on the table (bitterness), takes the high ground, and refuses to answer the question all at the same time. But it gets better:
Lysacek, continuing:Â “the truth is that he’s been a force to be reckoned with in men’s skating for the last decade and has been a great role model for me …Â [he] did something that no one thought was possible, [took time off,] came back, and got his third Olympic medal — two silvers and a gold — and that’s not something to be taken lightly.”
Wow. Call the guy who’s attacking you a role model and then cite his accomplishments in a clear and precise way.  This guy is good.
But it doesn’t stop there, Costas continues:Â “Plushenko said:Â ‘ … the sport itself is regressing if the Olympic champion doesn’t do the quad, just doing nice transitions and being artistic, that’s not enough, because figure skating is a sport, not a show,’ again quoting him.”
Lysacek: “Well I think it’s interesting that he puts so much emphasis on just one step in the program. It is a 4 minute and 40 second skating routine so we have to put together our strongest moves — jumps, spins, and footwork — and we’re graded on everything we do in between …”
Here he’s answering the question, but using a powerful technique — framing — in how he answers. Sure Plushenko wants to make it about one jump, but what about the other 4 minutes and 35 seconds? It gets better:
Lysacek:Â “… interesting enough, last night we tied on the component scores (the old artistic scores), and where I edged him — slightly — was on the technical scores which means my jumps were graded better than his and my spins were graded better than his.”
This guy’s on fire. First, he reframes the problem back to whole-routine and then fires a cannon through the “dancing” argument by saying, “uh, by the way, I won on technical scores.” And I love the passive voice : not “my spins were better,” but “my spins were graded better.” But it gets better still:
Lysacek: “… to me he had a challenge, he had to skate last, he had to wait until the end of the event, he had the most pressure on him because he was leading after the short program, and I thought he looked incredible. He went out and skated great and, for me, I congratulate him and hope that he’s 100% satisfied with that.”
Costas:Â “Was he gracious to you in the immediate aftermath?”
Lysacek: “Yes, he was very nice. He’s a great guy. I known him for a long time. I’ve looked up to him for a long time.”
What do I love about Lysacek?
- Great delivery
- Absolute sincerity and ergo credibility
- Great use of facts
- Refusal to engage in an emotional conflict
- Remapping the questions:Â saying what you want to say almost regardless of what was asked
Now, let’s look at Tiger’s confession, via some excerpts:
Now every one of you has good reason to be critical of me. I want to say to each of you, simply and directly, I am deeply sorry for my irresponsible and selfish behavior I engaged in …
But still, I know I have bitterly disappointed all of you. I have made you question who I am and how I could have done the things I did. I am embarrassed that I have put you in this position …
The issue involved here was my repeated irresponsible behavior. I was unfaithful. I had affairs. I cheated. What I did is not acceptable, and I am the only person to blame …
I stopped living by the core values that I was taught to believe in. I knew my actions were wrong, but I convinced myself that normal rules didn’t apply. I never thought about who I was hurting. Instead, I thought only about myself. I ran straight through the boundaries that a married couple should live by. I thought I could get away with whatever I wanted to. I felt that I had worked hard my entire life and deserved to enjoy all the temptations around me. I felt I was entitled. Thanks to money and fame, I didn’t have to go far to find them.
I was wrong. I was foolish. I don’t get to play by different rules. The same boundaries that apply to everyone apply to me. I brought this shame on myself. I hurt my wife, my kids, my mother, my wife’s family, my friends, my foundation, and kids all around the world who admired me.
I’ve had a lot of time to think about what I’ve done. My failures have made me look at myself in a way I never wanted to before. It’s now up to me to make amends, and that starts by never repeating the mistakes I’ve made. It’s up to me to start living a life of integrity.
Let me be a little cynical here, but in terms of frequency “star athlete / rockstar / celebrity / politician has affair” should be a dog-bites-man, not a man-bites-dog story. How is that John Denver can write his affairs into song lyrics …
There’s so many times I’ve let you down,
So many times, I’ve played around,
I’ll tell you now, that they don’t mean a thing
… and get away with it, while Tiger gets hung out to dry? (And yes, I know there are a few decades in between.)
The first mistake Tiger made (other than the affairs) was letting this story get so big. Some of that was out of his control (e.g., his enormous popularity) but a lot of it was controllable. He could have just said earlier what he ended up saying later: look, it’s no surprise that star athletes get a lot of “temptations” and I, uh, gave in. My bad, it happens all the time, and what’s between me and wife is none of your business. Next story, please.
But, having holed up for three months, he’s turned an “oh, another athlete had an affair” story into the Tiger Woods 24 Hours Mystery. And, unfortunately, his confession does nothing to provide the details that he should now sadly provide if he wants to kill off the mystery angle, once and for all.
His delivery was poor:Â scripted, stiff, hollow, robotic, insincere.
I didn’t like the way “therapy” was pitched. You could substitute “disease” for “affair” and “drug” for “therapy” and the script would still make sense. While I might sound harsh, that smacks of not taking responsibility.
The whole framing of the announcement was wrong. Who is he apologizing to? Everyone, it seems, but as one fan said: “he doesn’t owe me an apology.” Is he apologizing the sponsors who already fired him?   If so, send them a letter. In his public statement, he should be apologizing to his wife and his kids, period. The rest should be commentary for the media. Not a confession. Not an apology.
The execution was bad as well. Media attendance was limited to three reporters, alienating the journalists he’s trying to reach. The timing was in conflict with a golf event, further irritating the golf establishment. There was no Q&A, which further reinforced the stiff/scripted perception.
So what I did dislike about the Tiger confession?
- Insincere
- Scripted
- The therapy angle
- Poorly timed
- The mass apology framing
If I were Tiger’s PR advisor, I’d say the message (which should have been delivered fast) should be:
- I got caught up in the celebrity bubble
- I admit that I had affairs
- I apologize to my wife and kids for what I’ve done
- Any questions about my wife and family — either past or future — are personal, and I will not answer them
- Deep down I am unhappy and in therapy to try and fix that core problem
- I hope to return to golf within a year
- If I learn any lessons that are useful to others in this process, I hope to share them in the future (think:Â book!)
- This is extremely difficult for me and I thank you for your support
I value speed and authenticity in PR which is why I am so negative on the Tiger confession. But I must admit that the media has responded pretty positively to it, for example, this piece in the New York Times, entitled Vulnerability in a Disciplined Performance.
More Changes at Kellblog / Mark Logic CEO Blog
Today, we’re set to enter phase II of the transition from Mark Logic CEO Blog to Kellblog.
This afternoon, we will cut over to a new blog design. Authoring wise, I will be changing from Blogger as my authoring tool to Wordpress, though that shouldn’t directly effect my readers.
I will also be changing the RSS feeds shortly; more on that after the new feeds are up and running.
Also, remember I now tweet from @kellblog, no longer from @ramblingman.
Matt Turner 2010 Predictions from the Information Industry Summit
I've embedded it below:
Matt Turner 2010 Predictions from the Information Industry Summit
Just a quick post to highlight a nice one-minute video of Mark Logic’s own Matt Turner, captured making a few predictions at the SIIA’s 2010 Information Industry Summit in snowy New York City.
I’ve embedded it below:
IDC's Definiton of Search-Based Applications
Because I believe that IDC puts real thought and rigor into definitions, I pay attention when I see them attempting to define something. From past experience, IDC was about 10 years ahead of the market in predicting the convergence of BI and enterprise applications with -- even in the mid 1990s -- a single analyst covering both ERP and BI.
Here's how IDC describes search-based applications.
Search-based applications combine search and/or text analytics with collaborative technologies, workflow, domain knowledge, business intelligence, or relevant Web services. They deliver a purpose-designed user interface tailored to support a particular task or workflow. Examples of such search-based applications include e-Discovery applications, search marketing/advertising dashboards, government intelligence analysts' workstations, specialized life sciences research software, e-commerce merchandising workbenches, and premium publishing subscriber portals in financial services or healthcare.
There are many investigative or composite, text- and data-centric analysis activities in the enterprise that are candidates for innovative discovery and decision-support applications. Many of these activities are carried out manually today. Search-based applications provide a way to bring automation to a broad range of information worker tasks.
Some vendors are jumping whole hog into the nascent category. For example, French Internet and enterprise search vendor Exalead has jumped in with both feet, making search-based applications a key war cry in their marketing. In addition, Exalead's chief science officer, Gregory Grefenstette, seems a like match to the "Ggrefen" credited in Wikipedia with the creation of the search-based applications page.
Another vendor jumping in hard is Endeca, with the words "search applications" meriting the largest font on their homepage.
While you could argue that this is yet-another, yet-another focus for Endeca, clearly the folks in marketing -- at least -- are buying into the category.At Mark Logic, we are not attempting to redefine ourselves around search-based applications. Our product is an XML server. Our vision is to provide infrastructure software for the next generation of information applications. We believe that search-based applications are one such broad class of information applications. That is, they are yet another class of applications that are well suited for development on MarkLogic Server.
So, if the search-based applications message is resonating with you, then be sure to give us a call.
IDC’s Definiton of Search-Based Applications
Sue Feldman and the team over at IDC are talking about a new category / trend called search-based applications, and I think they may well be onto something.
Because I believe that IDC puts real thought and rigor into definitions, I pay attention when I see them attempting to define something. From past experience, IDC was about 10 years ahead of the market in predicting the convergence of BI and enterprise applications with — even in the mid 1990s — a single analyst covering both ERP and BI.
Here’s how IDC describes search-based applications.
Search-based applications combine search and/or text analytics with collaborative technologies, workflow, domain knowledge, business intelligence, or relevant Web services. They deliver a purpose-designed user interface tailored to support a particular task or workflow. Examples of such search-based applications include e-Discovery applications, search marketing/advertising dashboards, government intelligence analysts’ workstations, specialized life sciences research software, e-commerce merchandising workbenches, and premium publishing subscriber portals in financial services or healthcare.
There are many investigative or composite, text- and data-centric analysis activities in the enterprise that are candidates for innovative discovery and decision-support applications. Many of these activities are carried out manually today. Search-based applications provide a way to bring automation to a broad range of information worker tasks.
Some vendors are jumping whole hog into the nascent category. For example, French Internet and enterprise search vendor Exalead has jumped in with both feet, making search-based applications a key war cry in their marketing. In addition, Exalead’s chief science officer, Gregory Grefenstette, seems a like match to the “Ggrefen” credited in Wikipedia with the creation of the search-based applications page.
Another vendor jumping in hard is Endeca, with the words “search applications” meriting the largest font on their homepage.
While you could argue that this is yet-another, yet-another focus for Endeca, clearly the folks in marketing — at least — are buying into the category.
At Mark Logic, we are not attempting to redefine ourselves around search-based applications. Our product is an XML server. Our vision is to provide infrastructure software for the next generation of information applications. We believe that search-based applications are one such broad class of information applications. That is, they are yet another class of applications that are well suited for development on MarkLogic Server.
So, if you’re thinking about building something that you consider a search-based application, then be sure to include us on your evaluation list.
Microsoft / Fast Drops Linux and Unix Support. Should You Turn to MarkLogic As A Replacement?
Microsoft's move was announced in blog post with the misleading title of Innovation on Linux and UNIX, written by former Fast CTO and Microsoft distinguished engineer, Bjørn Olstad, and posted on the Microsoft Enterprise Search Blog. Excerpt:
With our 2010 products scheduled for release in a few months, we’ve just started to plan for our next wave of products. As a part of that planning process, we have decided that in order to deliver more innovation per release in the future, the 2010 products will be the last to include a search core that runs on Linux and UNIX. The Register put it somewhat less diplomatically: Microsoft Kills Fast's Linux and Unix Search Business.
The real question for Fast customers is: "what next?"
Here are my (not necessarily unbiased) views on the subject.
- If you were using Fast for vanilla enterprise search, then either move to Windows with Microsoft or move to the Google Appliance. Basic enterprise search has commoditized; there's no reason to pay top dollar for the Intranet crawl-and-index value proposition.
- If you were using Fast as an application development platform, perhaps alongside a relational database as a means of enabling applications that query both structured and unstructured content, then you should call Mark Logic.
- If you were using Fast for e-commerce search, then you should call Endeca, provided you don't mind that they are focused on becoming a BI company. Jabs aside, Endeca's core strength is in e-commerce search, so I would check them out.
- If you are a glutton for punishment, enjoy working with search products that use black box algorithms based on Bayes and Shannon, from a company no longer interested in search but instead focused on financial engineering through mergers and acquisitions, then you should definitely look at Autonomy.
- If you used Fast in either the media/publishing or government sectors, you should call Mark Logic. Mark Logic has strong practices in both media and government and for years, particularly in media, Fast was our #1 competitor. In media, we help build applications including custom publishing, rights management, role-based information applications, and multi-channel content delivery. In government, we work on applications that include content analytics, information sharing environments, metadata catalogs, government archives, and open source intelligence systems.
- If you used Fast in financial services, you should call Mark Logic. While this is a newer practice area, we have done work with derivatives contracts, FpML derivatives repositories, and equity and fixed income research publishing.
Beyond Fast Using Marklogic Server to Drive Growth
Microsoft / Fast Drops Linux and Unix Support. Should You Turn to MarkLogic As A Replacement?
In a not-so-shocking move,this week Microsoft announced that it was dropping support for the FAST Enterprise Search Platform (FAST ESP) on Linux and other Unix operating systems. Microsoft acquired the FAST ESP and related products in January 2008 via the $1.2B acquisition of Fast Search & Transfer, a financially troubled but nevertheless leading enterprise search vendor, headquartered in Norway.
Microsoft’s move was announced in blog post with the misleading title of Innovation on Linux and UNIX, written by former Fast CTO and Microsoft distinguished engineer, Bjørn Olstad, and posted on the Microsoft Enterprise Search Blog. Excerpt:
With our 2010 products scheduled for release in a few months, we’ve just started to plan for our next wave of products. As a part of that planning process, we have decided that in order to deliver more innovation per release in the future, the 2010 products will be the last to include a search core that runs on Linux and UNIX.
The Register put it somewhat less diplomatically: Microsoft Kills Fast’s Linux and Unix Search Business.
The real question for Fast customers is: “what next?”
Here are my (not necessarily unbiased) views on the subject.
- If you were using Fast for vanilla enterprise search, then either move to Windows with Microsoft or move to the Google Appliance. Basic enterprise search has commoditized; there’s no reason to pay top dollar for the Intranet crawl-and-index value proposition.
- If you were using Fast as an application development platform, perhaps alongside a relational database as a means of enabling applications that query both structured and unstructured content, then you should call Mark Logic.
- If you were using Fast for e-commerce search, then you should call Endeca, provided you don’t mind that they are focused on becoming a BI company. Jabs aside, Endeca’s core strength is in e-commerce search, so I would check them out.
- If you are a glutton for punishment, enjoy working with search products that use black box algorithms based on Bayes and Shannon, from a company no longer interested in search but instead focused on financial engineering through mergers and acquisitions, then you should definitely look at Autonomy.
- If you used Fast in either the media/publishing or government sectors, you should call Mark Logic. Mark Logic has strong practices in both media and government and for years, particularly in media, Fast was our #1 competitor. In media, we help build applications including custom publishing, rights management, role-based information applications, and multi-channel content delivery. In government, we work on applications that include content analytics, information sharing environments, metadata catalogs, government archives, and open source intelligence systems.
- If you used Fast in financial services, you should call Mark Logic. While this is a newer practice area, we have done work with derivatives contracts, FpML derivatives repositories, and equity and fixed income research publishing.
If you’re interested in more on this topic, below please find embedded a white paper we released in January — prior to knowing that Microsoft was discontinuing Fast support on Unix — which argues why media / publishing companies should consider moving to MarkLogic. It’s even more relevant now than when we released it.
Beyond Fast Using Marklogic Server to Drive Growth
How To Make a Great Corporate Blog

I'm happy to report that Kellblog was featured prominently in a story yesterday on Business Insider entitled How To Make An Awesome Corporate Blog.
I provided the first tip: throw "corporate" out the window.
That's because,definitionally, I don't think there are great corporate blogs. There are only great corporate bloggers.
- If you really want a "corporate" blog, try a "news and events" RSS feed instead. It will be less work and more directly meet the information need.
- If you want a ghost-written CEO blog, stop. It won't work. Give it up. (And read this post for more.)
- If you want coverage in the blogosphere, appoint smart people to engage with existing blogs/bloggers by commenting.
- If you really want your message, or some aspects of it, out through blogging, then find one or more people in the organization with the skill, time, and desire to write a blog that will indirectly benefit the company. For example, Timo Elliott at SAP writes such a blog, BI Questions.
- Throw corporate out the window
- Who should write the blog? Everyone
- Your content should go beyond your business. (I get cited here as well.)
- A blog is not about marketing (but good ones can end up doing just that)
- More content guidelines
- Get personal
- Encourage customer interaction
- If you can't do these points, then don't have one
- Awesome blogs to check out
I get another nice excerpt in the middle.
Whatever you do, your blog should not be "an advertisement for the company or a regurgitation of company news and press releases," Kellogg warns.
The full story is here. For those really interested in corporate blogging, you should check out what Debbie Weil has to say on the subject.
The Dawn of Financial Services Open Source Intelligence (OSINT)
Financial firms are particularly eager to recruit former Afghan and Iraq war vets with intelligence operations experience since they can bring new technology and techniques to research and analysis.
That's good news for Mark Logic in financial services because OSINT is:
- A key focus area in of our government division
- An area in which we have developed significant expertise, not only in terms of product requirements, but also in how to support customers in building and deploying complete OSINT systems
- A topic of personal interest to me. I have blogged about it numerous times over the years, of which my favorite post was a review of a superb article on the subject, Open Secrets, by Malcom Gladwell.
- An application that we are starting to see in other verticals. For example, when we hosted our webinar, Harvesting Deep Web Content for Open Source Intelligence, we had not only the usual suspects (i.e., government agencies), we had -- much to my surprise -- a few attendees from other markets as well.
- An application that does not lend itself well to the predefined world of traditional database technologies. It is a near perfect fit for MarkLogic. It is -- in fact and quite literally -- exactly what MarkLogic Server was originally designed to do. (Maybe one day, I'll tell that whole story.)
Today, the private sector needs timely, relevant and actionable "intelligence" to secure their businesses against potential threats, he explains. Some of this intelligence can be produced with open source information " publicly available information that anyone can lawfully obtain. The full story is here.
Congresswoman Jackie Speier Visits Mark Logic
First, we had a meeting with our executives to provide an overview of what the company does, described in more detail some of the projects we work on within government, and talked about our customers in financial services and their applications in derivatives trading and risk management.
We then gave Congresswoman Speier a tour of our offices. After that, she gave a brief presentation to the company and did some excellent "town hall" style Q&A. Among other topics, she discussed public education in California and I must say that even I was surprised to see about half the hands in crowd go up in response to her question: "who attended the University of California?"
While a relatively new member of the House of Representatives, Congresswoman Speier has a long history in politics, having served in San Mateo County, the California State Assembly, and the California State Senate.
While we didn't talk about it during her visit, in researching her background, I was amazed to learn that in 1978, as a Congressional aide, she accompanied Congressman Leo Ryan on his fact-finding mission to Jonestown, was shot five times during the cult's ambush on that mission, and had to wait 22 hours before help arrived.
We were thrilled to have Congresswoman Speier visit us today and despite her busy agenda, we even had time to sneak in a photo.

From left, Chris Biow, Mark Logic Federal CTO; Congresswoman Jackie Speier; Dave Kellogg, Mark Logic CEO; and Christopher Lindblad, Mark Logic Founder and Chief Architect.
Endeca and BI: Yet Another Strategy, But This Time Maybe The Right One
I've written before about Endeca's strategic issues, which I've found more than a bit ironic for a company with no shortage of Harvard MBAs and 1980s strategy guru Michael Porter on its board, heading the strategy committee.
While brains are most certainly not lacking, perhaps common sense is. I think Endeca has been making a common mistake, one which I call getting bored with your market. Or, as I once quipped, "getting board" as this is often a top-down phenomenon.
A few of my favorite examples of this grass-in-greener syndrome include:
- Informatica, which decided that ETL was boring, and launched itself into the seemingly emerging but never-to-emerge analytic applications category, only to cede the ETL market to Ascential who was only too happy to serve it. This story ends happily with Informatica re-discovering its DI roots after a few years of turmoil.
- Verity, which decided that enterprise search was a dead market, repositioned itself as a intellectual capital management vendor, and thus ceded the enterprise search market to Fast Search & Transfer who was only too happy to serve it. This story ended less happily with Verity selling itself to Autonomy, a company then half its size.
- Ingres, which made the epic-fail decision that "relational databases were commodities" in the late 1980s, effectively declaring the RDBMS party over, focusing its energies and application development tools. At the time, the RDBMS market was ~$500M. Today it's $15B oligopoly, driving 50% operating margins. Application development tools have remained a economic bloodbath for the past 20 years.
- They imagine the market as a standing army and not as a parade. In reality, the market marches past the company as it develops over time. (Think of it as a parade with a changing width.) So what might be 7 years old to the company is brand new to folks in the market walking by for the first time.
- They are under constant pressure to change. The board, bankers, financial analysts, industry analysts all constantly ask management "what's next?", "what's the roadmap?" and "what's your vision?" For some inexplicable reason, most management teams lack the courage to answer: more of the same; we think we've penetrated about 10% of the total market and we intend to keep on keeping on, growing both domestic and internationally, becoming the worldwide dominant leader in what we do. Instead, they talk about new markets, new products, obsolescing and de-positioning their current offerings in the process.
- They confuse stack layer with market structure. Most people who sell platform technologies are frustrated because they can't sell as a direct solution to a hot business problem. Imagine solution selling a business executive on electricity. You can't. There are too many use-cases. They assume that profit corresponds with directness to business value when in reality, profit is determined by industry structure and its profit zones. Relational databases are one heck of a profitable business, but they do not map to solving one "silver bullet" business problem.
- They are afraid. My favorite example was Arbor Software, makers of Essbase, who did a panicked and subsequently disastrous merger with Hyperion Software in response to Microsoft's acquisition of Panorama software, which provided the foundation for Microsoft Analysis Services. Yes, Microsoft's market entry was a problem, but they was plenty of room for both high-end and low-end offerings. I always viewed the combination as the emergency evacuation of a beach in San Francisco in response to a shark sighting in Los Angeles.
Example: typing Cabernet into a wine store's search engine should allow you to iteratively refine your query along several dimensions, such as price (<$10, $10-$25, $25+), bottle size (375 ml, 750 ml, 1500 ml), country (US, France, Australia), region (which is hierarchically dependent on country, meaning France-->Bordeaux, US-->California-->Napa), and Parker points (<80, 80-89, 90-95, 96-100).
This is what Endeca is good at. This is what they grew up doing. And this is what I think they're bored with.
That boredom has resulted in a number of initiatives and reorganizations over the years. For an order-of-magnitude $100M company, look at everything they discuss on their website:
- Specialization in 5 industries
- Production of 5 products
- Expertise in about 20 different solution areas, total
The e-commerce roots were always there. They made a modest push in media/publishing. They also made a push into government/intelligence, but I think product scalability concerns hobbled the effort. The strategic push prior to BI was in manufacturing.
From the cheap seats, my advice for Endeca is simple.
- Either, return to the historical focus, a la Informatica, tripling-down on e-commerce with a goal towards owning that market, worldwide
- Or, bet all-in on a strategic evolution to BI.
Frankly, my gut is that the first market is big enough, that they're underestimating it, and that the best strategy would be a roots rediscovery. But there are also some good reasons to pursue the second strategy:
- BI is largely a front-end game, where DBMSs do the heavy lifting. I believe that Endeca has always been better at data presentation and user interface than building scalable processing engines, so it seems to fit with the company's core competence.
- It's been done before. Cognos, today a top BI vendor (and piece of IBM), did a nearly miraculous transition from application development tools (Powerhouse) to BI tools (Impromptu, Powerplay) over about a seven-year period in the 1990s. Endeca wouldn't be the first market refugee to seek asylum in BI.
- Unification of structured and unstructured information is the next big thing in BI and, given its heritage, Endeca is well positioned to provide it. I've always believe that it's easier to make systems designed for unstructured data work with structured data than the inverse.
But Endeca's not your typical search vendor; they've paid attention to both content and data since day one. And they make pretty user interfaces, which are a critical success factor in BI. They've started to attack the data integration problem by licensing Informatica, so they are serious about this initiative; it's not just marketing fluff.
All they need to do is build it atop the right DBMS platform for managing structured and unstructured information. I don't think that's the MDEX engine. Nor any bolting of MySQL to Lucene.
Hint, hint.
Coming Friday 1/29/10: Kellblog!
- Get a shorter, pithier name that will be easier for people write and talk about
- Get a more normal, blog-like name that will hopefully increase citations and in-bound links
- Get a name that better reflects the content of the blog. While the blog certainly contains some pro-Mark-Logic posts, the majority of the content is not typical "corporate blog" fodder
- Site readers will automatically be redirected to the new domain: www.kellblog.com
- Feed subscribers using the proper Feedburner feed will need to do nothing, since -- for the time being -- the feed address will remain http://feeds.feedburner.com/marklogic. (At some future point, we'll switch the feed, but we have plenty of other work to do first.)
- On Friday, February 12th, 2010, I intend to cutover to a fresher, crisper, simpler design to provide the blog with a new, and more contemporary, look.
- I also intend to switch work-related tweets to a new account @kellblog, as opposed to my original Twitter account @ramblingman, from which I no longer expect to tweet. So, please follow @kellblog right now!
Six Things Publishers Should Be Able To Do With Content
The slide was a list of six things that publishers should be able to do with their content. For this blog post, I'd say the scope includes publishers of any ilk, professional publishers who content is their business and "accidental" publishers -- i.e., enterprises whose primary business is not content publishing, but where content nevertheless plays a mission-critical role (e.g., doctrine for the Army, in-flight manuals for airlines, or maintenance procedures for medical devices, such as PET scanners).
So, if content either is your business or is mission-critical to it, then here are the six things you should be able to do with it:
- Integrate it. Content is more valuable when it's integrated with other content. Typically this means putting it in one place and then transforming it -- over time -- to a common structure/schema. Note that many systems require a 'big bang" approach that requires 100% cleansed content as the first step. This artificial technology constraint dooms many projects to failure because that first step's a doozy and is typically never completed before the business runs out of budgetary patience. Instead of trying to clean the Augean Stables as step one, adopt a lazy approach to content transformation, cleansing, and enrichment.
- Enrich it. Content can be made more valuable by enriching it; using text-mining tools to identify entities such as people and places, phone numbers and credit card numbers, geopolitical organizations, or diseases and symptoms. No matter which entities are important to your content, the odds are you can find a text mining tool that will identify them. But, whatever you do, don't extract the entities from your content by loading them into relational tables that say "document 17 mentions Paris." Instead, enrich the content itself by through the addition of in-line markup that says <city>Paris </city> directly in the text. In-line markup allows for much more powerful queries than entity extraction. So don't extract from your content; enrich it, instead.
- Slice and dice it. Much as good business intelligence tools let you slice and dice data, so should good content tools let you slice and dice content. Slicing and dicing means pulling content in any way that you want. You want all the section headers to dynamic build a table of contents? Great. You want all the figures and captions, only? Great. You want all the chapters in a corpus sorted by relevancy to a specific phrase? Great. You want the abstracts of articles written by a given author in a certain time period? Great. Slicing and dicing content means querying it along any dimension you want, instantly. When you can slice and dice content, you can repurpose it into new products in virtually unlimited ways.
- Deliver it. You should be able to deliver content from one repository to all of your distribution channels: web, print, feeds (e.g., RSS, Atom), BlackBerries, iPhones, the Kindle, other e-readers, 508-compliant readers, other phones, and -- heck -- even the rumored iTablet. The point of multi-channel publishing is to be fully separate formatting from structure so that you can dynamically render content from a central repository to all of the various forms -- some existing and many not yet existing -- that your content consumers want. The rapid transformation of XML is key to delivering on this vision.
- Analyze it. Today's readers don't just to consume content, they want to surf it and analyze it. The want dynamic wordclouds or tagclouds. They want to do frequency analysis. They may to analyze co-occurrence -- e.g., between side-effects and drugs or symptoms and diseases. They want to count results and to slice and dice those counts using facets. They want to be able to feed visualization tools to create interfaces such as hyperbolic trees. It's no longer enough to simply locate and deliver content: both your consumers and your internal producers want statistics both to learn more from the content and to determine who's reading what to assist in future planning.
- Contextualize it. The Holy Grail of publishing is to put content in context. For example, rather than teaching a pilot a table full of information about descent rates at various altitudes, to instead give them one descent recommended for the specific airplane he's playing into a given airport at a specific altitude. Instead of dumping a tome full slides under various stains on a pathologist, give him an application that walks him through the process of differential diagnosis of a given tumor. Instead of documentation on service personnel, give them a laptop that outlines the exact steps -- specific to a given make, model, and unit -- for performing maintenance on an expensive medical device. Instead of a generic lesson for a student, intermix content and exercises in a way that's specific and optimal for their apparent knowledge.

