Jump to content

Web Data Services Extend Data Access and Distribution Beyond the RDB-BI Straightjacket


Full transcript of podcast with a focus on information management for business intelligence.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on how to make the most of web data services for business intelligence (BI). As enterprises seek to gain better insights into their markets, processes, and business development opportunities, they face a daunting challenge -- how to identify, gather, cleanse, and manage all of the relevant data and content being generated across the Web.

In Part 1 of our series we discussed how external data has grown in both volume and importance across internal Internet, social networks, portals, and applications in recent years. As the recession forces the need to identify and evaluate new revenue sources, businesses need to capture such web data services for their BI to work better and fuller.

Enterprises need to know what's going on and what's being said about their markets across those markets. They need to share those web data service inferences quickly and easily across their internal users. The more relevant and useful content that enters into BI tools, the more powerful the BI outcomes -- especially as we look outside the enterprise for fast shifting trends and business opportunities.

In this podcast, Part 2 of the series with Kapow Technologies, we identify how BI and web data services come together, and explore such additional subjects as text analytics and cloud computing.

So, how to get started and how to affordably bring web data services to BI and business consumers as intelligence and insights? Here to help us explain the benefits of web data services and BI, is Jim Kobielus, senior analyst at Forrester Research.

Jim Kobielus: Hi, Dana. Hello, everybody.

Gardner: We're also joined by Stefan Andreasen, co-founder and chief technology officer at Kapow Technologies. Welcome, Stefan.

Stefan Andreasen: Thank you, Dana. I'm glad to be here.

Gardner: Jim, let's start with you. Let's take a look at what's going on in the wider BI field. Is it true that the more content you bring into BI the better, or are there trade-offs, and how do we manage those tradeoffs?

The more the better

Kobielus: It's true that the more relevant content you bring into your analytic environment the better, in terms of having a single view or access in a unified fashion to all the information that might be relevant to any possible decision you might make within any business area. But, clearly, there are lots of caveats, "gotchas," and trade-offs there.

One of these is that it becomes very expensive to discover, to capture, and to do all the relevant transformation, cleansing, storage, and delivery of all of that content. Obviously, from the point of view of laying in bandwidth, buying servers, and implementing storage, it becomes very expensive, especially as you bring more unstructured information from your content management system (CMS) or various applications from desktops and from social networks.

So, the more information of various sorts that you bring into your BI or analytic environment, it becomes more expensive from a dollars-and-cents standpoint. It also becomes a real burden from the point of view of the end user, a consumer of this information. They are swamped. There's all manner of information.

If you don't implement your BI environment, your advanced analytic environment, or applications in a way that helps them to be more productive, they're just going to be swamped. They're not going to know what to do with it -- what's relevant or not relevant, what's the master reference, what's the golden record versus what's just pure noise.

So, there is that whole cost on productivity, if you don't bring together all these disparate sources in a unified way, and then package them up and deliver them in a way that feeds directly into decision processes throughout your organization, whether HR, finance, or the like.

Gardner: So, as we look outside the organization to gain insights into what market challenges organizations face and how they need to shift and track customer preferences, we need to be mindful that the fire hose can't just be turned on. We need to bring in some tools and technologies to help us get the right information and put it in a format that's consumable.

Kobielus: Yes, filter the fire hose. Filtering the fire hose is where this topic of web data services for BI comes in. Web data services describes that end-to-end analytic information pipe-lining process. It's really a fire hose that you filter at various points, so that the end users turn on their tap and they're not blown away by a massive stream. Rather, it's a stream of liquid intelligence that is palatable and consumable.

Gardner: Stefan, from your perspective in working with customers, how wide and deep do they want to go when they look to web data services? What are we actually talking about in terms of the type of content?

Andreasen: Referring back to your original question, where you talk about whether we need more content, and whether that improves the analysis and results that analysts are getting, it's all about, as Jim also mentioned, the relevance and timeliness of the data.

There is a fire hose of data out there, but some of that data is flowing easily, but some of it might only be dripping and some might be inaccessible at all. Maybe I should explain the concept.

Think about it this way. The relevant data for your BI applications is located in various places. One is in your internal business applications. Another is your software-as-a-service (SaaS) business application, like Salesforce, etc. Others are at your business partners, your retailers, or your suppliers. Another one is at government. The last one is on the World Wide Web in those tens of millions of applications and data sources. There is very often some relevant information there.

Accessible via browser

Today, all of this data that I just described is more or less accessible in a web browser. Web data services allow you to access all these data sources, using the interface that the web browser is already using. It delivers that result in a real-time, relative, and relevant way into SQL databases, directly into BI tools, or to even service enabled and encapsulated data. It delivers the benefits that IT can now better serve the analysts need for new data, which is almost always the case.

BI projects happen in two ways. One is that you make a completely new BI. You get a completely new BI system, and then make brand-new reports, and new data sources. That's the typical BI project.

What's even more important is that incremental daily improvement of existing reports. Analysts sit there, they find some new data source, they have their report, and they say, "It would be really good, if I could add this column of data to my report, maybe replace this data, or if I could get this amount of data in real-time rather than just once a week." So it's those kinds of improvements that web data services also really can help with.

Gardner: Jim Kobielus, it sounds like we've got two nice opportunities here. One is the investments that have already been made in BI internally, largely for structured data. Now, we have this need to look externally and to look at the newer formats internally around web content and browser-based content. We need to pull these together.

Kobielus: There are a lot of trends. One of them is, of course, self-service mashups by end users of their own reports, their own dashboards, and their own views of data from various sources, as well as their data warehouses, data marts, OLAP cubes and the like.

But, another one gets to what you're asking about, Dana, in terms of trends in BI. At Forrester, we see traditional BI as a basic analytics environment, with ad-hoc query, OLAP, and the like. That's traditional BI -- it's the core of pretty much every enterprise's environment.

Advanced analytics, building on that initial investment and getting to this notion of an incremental add-on environment is really where a lot of established BI users are going. Advanced analytics means building on those core reporting, querying, and those other features with such tools as data mining and text analytics, but also complex event processing (CEP) with a front-end interactive visualization layer that often enables mashups of their own views by the end users.

When we talk about advanced analytics, that gets to this notion of converging structured and unstructured information in a more unified way. Then, that all builds on your core BI investment -- smashing the silos between data mining and text mining that many organizations have implemented for good reasons. These are separate projects, probably separate users, separate sources, separate tools, and separate vendors.

We see a strong push in the industry towards smashing those silos and bringing them all together. A big driver of that trend is that users, the enterprises, are demanding unified access to market intelligence and customer intelligence that's bubbling up from this massive Web 2.0 infrastructure, social networks, blogs, Twitter and the like.

Relevant to ongoing activities

That's very monetizable and very useful content to them in determining customer sentiment, in determining a lot of things that are relevant to their ongoing sales, marketing, and customer service activities.

Gardner: So, we're not only trying to bring the best of traditional BI with this large pool of valuable information from web data services. We're also trying to extend the benefits of BI beyond just the people who can write a good SQL query, the proverbial folks in the white lab coats behind the glass windows. We're trying to bring those BI analytics out to a much larger class of people in the organization.

Kobielus: Exactly. SQL queries are the core of traditional BI and data warehousing in terms of the core access language. Increasingly, in the whole advanced analytics space, SQL is becoming just one of many access techniques.

One might, in some ways, describe the overall trend as toward more service-oriented architecture (SOA), oriented access of disparate sources through the same standard interfaces that are used everywhere else for SOA applications. In other words, WS/XML, WSDL, SOAP, and much more.

So, SOA is coming to advanced analytics, or is already there. SOA, in the analytics environment, is enabled through a capability that many data federation vendors provide. It's called a "semantic virtualization layer." Basically, it's an on-demand, unified roll up of disparate sources.

It transforms them all to a common set of schemas and objects, which are then wrapped in SOA interfaces and presented to the developer as a unified API or service contract for accessing all this disparate data. SOA really is the new SQL for this new environment.

Gardner: Stefan, what is holding back organizations from being able to bring more of this real-time, highly actionable information vis-à-vis web services? What's preventing them from bringing this into use with their BI and analytics activity?

Andreasen: First, let me comment on what Jim said, and then try to answer your question. Jim's comment about SOA as common to BI is really spot on.

The world is more diverse

Traditionally, for BI, we've been trying to gather all the data into one unified, centralized repository, and accessing the data from there. But, the world is getting more diverse and the data is spread in more and different silos. What companies realize today is that we need to get service-level access to the data, where they reside, rather than trying to assemble them all.

So, tomorrow's data stores of BI, and today's as well -- and I'll give you an example -- is really a combination of accessing data in your central data repositories and then accessing them where they reside. Let me just explain that by an example.

One Fortune 500 financial services company spent three years trying to build a BI application that would access data from their business partners. The business partners are big banks spread all over the U.S. The effort failed, but they had to solve this problem, because it was a legal and regulatory necessity for them.

So, they had to do it with brute force. Basically, they had analysts logging into their business partners' web sites and business applications, and copying and pasting those data into Excel to deliver those reports.

Finally, we got in contact with them, and we solved that problem. Web data services can encapsulate or wrap the data silos that were residing with their business partners into services -- SOAP services, REST services, etc. -- and thereby get automated access to the data directly into the BI tool. So, the problem they tried to solve for three years could now be solved with data services, and is running really successfully in production today.

Kobielus: Dana, before we go to the next question, I want to extend what Stefan said, because that's very important to understand this whole space. This new paradigm, where SOA is already here in advanced analytics, is enabled by mashup. I published a report recently called Mighty Mashups that talks about this trend.

You need two core things in your infrastructure to make this happen. One is data mashups. In the back end, in the infrastructure, you need to have orchestrated integration, transformations, consolidation, and joining among disparate data sets. Then, you expose those composite data objects as services through SOA.

Then, in the front end, you need to enable end users to have access to these composite data objects through a registry, or whatever you call it, that's integrated into the environments where the user actually does work, whether it's their browsers/portal, Excel, or Microsoft Office environment. So, it's the presentation mashup on the user front end, and data mashup -- a.k.a. composite data objects -- on the back end to make this vision a reality.

Gardner: So, what's been holding back this ability to use a variety of different data types, content types, and data services in relation to BI has been proprietary formats, high cost and complexity, laborious manual processes, perhaps even spreadsheets, and a little older way of presenting information. Is that fair, Stefan?

Andreasen: I think so, yes. This is also where web data services technology comes into play. Who knows best what data they want? It's the analysts, right? But who delivers the data? It's the IT department.

Tools are lacking

Today, the IT department often lacks tools to deliver those custom feeds that the line of business is asking for. But, with web data services, you can actually deliver these feeds. The data that IT is asking for is almost always data they already know, see, and work with in the business applications, with the business partners, etc. They work with the data. They see them in the browsers, but they cannot get the custom feeds. With the web data services product, IT can deliver those custom feeds in a very short time.

Let me use an example here again. This is a real story. Suppose I am the CEO of one of the largest network equipment manufacturers in the world. I am running a really complex business, where I need to understand the sales figures and the distribution model. I possibly have hundreds of different systems and variables I need to look at to run my business.

Another fact is I am busy. I travel a lot. I'm often in the airport or where I don't have access to my systems. When I finally get access, I have to open my laptop, get on the 'Net’, and pull out my report.

What we did here was we took our product, service enabled the relevant reports, built a Blackberry front end to that, and delivered that in three hours, from start to end. So, suddenly, in a very agile fashion, the CEO could reach his target and look at his data anywhere he had wireless access.

Gardner: It must be very frustrating for these analysts, business managers, and business development people to be able to see content and data out on the web through their browser, but not be able to get it into context with their internal BI systems, and get those dashboards and views that allow a much fuller appreciation of what's really going on.

Andreasen: It's almost absurd. Think about it. I'm an analyst and I work with the data. I feel I own the data. I type the data in. Then, when I need it in my report, I cannot get it there. It's like owning the house, but not having the key to the house. So, breaking down this barrier and giving them the key to the house, or actually giving IT a way to deliver the key to the house, is critical for the agility of BI going forward.

Kobielus: I agree. Here's an important point I want to make as well. The key to making this all happen, making this mashup vision of reality in the final analysis, is expanding the flexibility of your data or source discovery capabilities within the infrastructure.

Most organizations that have a BI environment have one or more data warehouses aggregating and storing the data and they've got pre-configured connections and loading of data from specific sources into those data warehouses. Most users who are looking at reports in their BI environment are looking only at data that's pre-connected, pre-integrated, pre-processed by their IT department.

The user feels frustration, because they go on the Web and into Google and can see the whole universe of information that's out there. So, for a mashup vision to be reality, organizations have got to go the next step.

Much broader range

It's good to have these pre-configured connections through extract, transform and load (ETL) and the like into their data warehouse from various sources. But, there should also be ideally feeds in from various data aggregators. There are many commercial data aggregators out there who can provide discovery of a much broader range of data types -- financial, regulatory, and what not.

Also, within this ideal environment there should be user-driven source discovery through search, through pub-sub, and a variety of means. If all these source-discovery capabilities are provided in a unified environment with common tooling and interfaces, and are all feeding information and allowing users to dynamically update the information sets available to them in real-time, then that's the nirvana.

That means your analytic environment is continuously refreshed with information that's most relevant to end users and the decisions they are making now.

Gardner: So, we've identified the problem, and that's bringing the best of web services and web data into the best of what BI does and then expanding the purview of that beyond the white lab coats crowd, into the people who can take action on it. That's great. But, with the fire hose, we can't just start allowing this access to these data services without what the IT department considers critical. That is to keep the cost down, because we're still in recession and the budgets are tight.

We also need to have governance. We need to have manageability. We need to make the IT people feel like they can be responsible in opening up this filtered fire hose. So how do we do that, Stefan? How do we move from pure web static to an enterprise-caliber web data services?

Andreasen: Thank you for mentioning that. Jim, to get back to you on mashups, that's really relevant. Let's just look at the realities in IT departments today. They're probably understaffed. They've probably got budget cuts, but they have more demand from lines of business, and they probably also have more systems they have to maintain. So, they're being pushed from all sides.

What's really necessary here is a new way of solving this problem. This is where Kapow and web data services come in, as a disruptive new way of solving a problem of delivering the data -- the real-time relevant data that the analyst needs.

The way it works is that, when you work with the data in a browser, you see it visually, you click on it, and you navigate tables and so on. The way our product works is that it allows you to instruct our system how to interact with a web application, just the same way as the line of business user.

This means that you access and work with the data in the world in which the end users see the data. It's all with no coding. It's all visual, all point and click. Any IT person can, with our product, turn data that you see in a browser into a real feed, a custom feed, virtually in minutes or in a few hours for something that would typically take days, weeks, or months -- or may even be impossible.

Hand in hand

So a mashup is really an agile business application, a situational application. How can you make situational BI without agile data, without situational data? They basically go hand in hand. For mashups to deliver on the promise, you really need a way to deliver the data feeds in a very agile fashion.

Gardner: But what about governance and security?

Andreasen: Web data services access the data in the way you do from a web browser. All data resides in a database somewhere -- inside your firewall, at a customer, at a partner, or somewhere. That database is very secure. There's no way to access the database, without going through tedious processes and procedures to open a hole in that firewall.

The beauty with web data services is that it's really accessing the data through the application front end, using credentials and encryptions that are already in place and approved. You're using the existing security mechanism to access the data, rather than opening up new security holes, with all the risk that that includes.

Gardner: Jim, from some of the reports that you've done recently, what are customers, the enterprise customers, telling you about what they need in terms of better access to web data services, but also mindful about the requirements of IT around security and governability and so forth?

Kobielus: Right, right. The core theme I'm hearing is that mashups, user self-service development, and maintenance of user disparate data are very, very important, for lots of reasons. One, of course, is speeding delivery of analytics and allowing users to personalize it, and so forth. But, mashups without IT control is essentially chaos. And, mashups without governance is an invitation to chaos.

What does governance mean in this environment? Well, it means that users should be able to mashup and create their own reports and dashboards, but, from the perspective of the companies that employ them, they should only be able to mashup from company-sanctioned sources, such as data warehouses data marts, and external sources.

They should be able to only mashup that data, tables, records, or fields that they have authorized access to. They should only be able to mashup within the bounds of particular templates, reports, and dashboards that are sanctioned by the company and maintained by IT. There should be ongoing monitoring of access, utilization, and refreshes.

Then, users should be able to share their mashups with other users to create ever more composite mashups, but they should only be able to share data analytics that the recipient has authorized access to.

Now, this sounds like fascism, but it really isn't, because in practice what goes on is that users are usually given a long leash in a mashup environment to be able to pull in external data, when need be, with IT being able to monitor the utilization or the access of that data.

Fundamentally, governance comes down to the fact that all the applications are stored within a metadata environment -- repositories, and so forth -- that are under management by IT. So, that's the final piece in the mashup governance equation.

Gardner: I think I'm hearing you say that you really should have an intermediary between all of that web data and your BI analytics and the people making the decisions, not only for those technical reasons, but also to vet the quality of the data.

It’s in IT’s interest

Kobielus: Exactly. This is in IT's interest, and they know that. IT wants to insource as much of the development and maintenance of reports and dashboards and the like as they can get away with, which means it's pushed down to the end user to do the maintenance themselves on their own views.

IT is more than happy to go toward mashup, if there is the ability for them to keep their eyes and ears open, to set the boundaries of the sandbox, and insource to end users.

Gardner: Stefan, I want to go back to you, if I could. We talked about how to bring this into IT, but we also need to bring in to this the role of the developer, because we're just not talking about integration, we're also talking about presentation.

Does what Kapow brings to the table also allow those developers to get a task about trying to expose web data services within the context of applications, views, different audit presentation, dashboards, and what not? What's the role of the developer in this?

Andreasen: That's very important. We talked about this fire hose before. When I see a fire hose in front of me, I imagine the analyst can now open this fire hose and all the data in the world just splashing in their face, and that's really not the case. web data services allows the developer to incite the IT department to much more quickly develop and deliver those custom feeds or those custom web services that the analysts need in the BI tools.

Also, on governance, the reality is that the data that has value is data that comes from business partners, from government, or from sources where you have a business relationship, and therefore can govern it. But, for various reasons, you cannot rewrite those applications, you cannot access those SQL databases in a traditional way. web data services is a way to access data from trusted sources, but access them in a much more agile way.

Gardner: Those services are coming across in a standardized format that developers can work with using existing tools.

Andreasen: Yes, that's very important. Web data services deliver the data into your standard data warehouse, into your standard SQL databases. Or, as I said earlier, it can wrap those applications into SOAP services, REST services, RSS feeds, and even .NET and Java API, so you get the API or you get the data access exactly the way you need it in your BI tool, in your data mining environment, etc.

Gardner: We've established the need. We've looked at the value of increasing BI's purview. We've looked at the larger trends around SOA and bringing lots of different data types into an architecture that can then be leveraged for BI and analytics. We've looked at the need for extending this to business processes outside the organization, as well as data types inside. We've looked at the role of the developer.

Are there examples, Stefan, of people who are actually doing this, who have been early adopters, who have taken the step of recognizing an intermediary and the tool and platform set to manage web data services in the context of BI? And, if they've done that, what are the paybacks, what are the metrics of success?

Andreasen: One of our early adopters is Audi. They've been using our product for five years. What was important for them was that, traditionally, it could take three to six months for them to get access to some data. But, with the Kapow Web Data Server, they were able to access data and create these custom feeds in a much shorter fashion, days rather than months.

What the business needs

They have been using it successfully for five years. They are growing with it, they're getting a lot of benefit around it, and couldn't imagine running the IT department without web data services today, because it gives them the way to deliver this agile custom data feeds that the business needs.

Gardner: Jim Kobielus, looking to the future, it seems to me that there is going to be more types of data coming from external sources. Perhaps, more of the internal data that companies have used in traditional applications -- BI and integration -- might find itself being housed in server farms, otherwise known as clouds, either on-premises, on some third-party grid or utility fabric, or some hybrid of the two.

When we factor in the movement and expected direction of cloud computing, how does that then bear down on the requirements for managed, governed, and IT-caliber, mission-critical caliber web data service tools?

Kobielus: It simplifies it and complicates it. It simplifies to some degree or enables this vision of self-service BI mashup, with automated source discovery, to come to fruition. You need a lot of compute power, you need a lot of data storage to do things like high volume, real-time text analytics.

A lot of that is going to have to be outsourced to public clouds that are scalable. They can scale out petabytes worth of data or can scale out some massive server farms to do semantic analysis and transformations and the like. So, the storage and the processing for most visions have to be outsourced to cloud providers. To some degree it makes it possible to realize this vision on the back end, at the web data services and data mashup side.

It also complicates it, because now you're introducing more silos. Public clouds are essentially silos from each other. There is Amazon, and there is the Windows SQL data or Azure, Then, of course, there is Google and a variety of others that are providing clouds that don't interoperate well, or at all, with each other. They don't necessarily interoperate out of the box with your existing premises data environment, if you're an enterprise.

So, the governance of all these disparate functions, the coordination of security, and the encryption and so forth across all these environments, as well as the coordination of the data archiving and auditing need to be worked out by each organization that goes this route with a disparate and motley assortment of internal and external platforms that are managing various functions within this analytic cloud.

In other words, it could complicate this whole equation considerably, unless you have one predominant public cloud partner that can do all the data integration, all the cleansing, all the transforms, all the warehousing in their cloud, and can provide you also with this SOA abstraction layer, the semantic virtualization layer, and can also ideally host your advanced analytics applications, like your data mining, in that environment.

It can do it all for you in a very streamlined way, with a common governance, security administration, and data modeling toolset. Remember, end users are a big part of this equation here. The end users can then pick up these cloud-based tools to mash up data within this unified cloud and mash it up in a way that makes sense to end users, not the professional black belt data modelers.

That vision cannot be realized right now with the commercial cloud offerings in the analytic market. I think it will take about two to three to five years for the cloud providers to go this route. It's not there yet.

Gardner: We're about out of time. I want to take the same question to Stefan about the cloud computing angle and the mixed sourcing for applications, datasets, and business processes. It seems to me this would be an opportunity for Kapow.

No master hub

Andreasen: Absolutely. What I don't see is one big vendor that solves all your data needs and becomes like the master hub for all information and data on the Web. History has shown that the way that companies compete with each other is to differentiate themselves.

If everybody was using the same provider and the same kind of data, they couldn't differentiate. This is really, I think, what companies realize today -- unless we do something different and better, than our competitors, we are not going to win this game.

What's important with web data services is hosting the tools and the facilities to access the data, but allowing the customers to create in a self-service fashion the custom data feeds they need. Our product fits perfectly into that world as well. We already have many of our customers using out product in the cloud. We become a tool where they can create ad hoc, on demand, or as necessary data feeds, and to share them with anybody else that needs them.

Kobielus: I've got one more point. In this ecosystem that's emerging, there's a strong role for providers of tooling specifically focused on self-service mashup and also for what's often called on-demand analytical sandboxing, which could be used by end users to create their own analytic workspace, and pull information.

Those that can provide the tooling that works in front of whatever the organization's preferred data management or data federation or data warehousing or BI vendor might be. So there's a plenty of opportunity for the likes of Kapow, and many others in this space too, for complementary solutions that are integrated with any of the leading data federation and cloud analytic solutions that are out there.

Gardner: Very good. I'm afraid we'll have to leave it there. We've been discussing the requirements around bringing web data services into BI, but doing so in a mission-critical fashion that's amenable to the IT department.

I want to thank our guests. We've been joined by Jim Kobielus, senior analyst at Forrester Research. Thanks, Jim.

Kobielus: Sure, no problem.

Gardner: We've also been joined by Stefan Andreasen. He's the co-founder and chief technology officer at Kapow Technologies. Thank you so much, Stefan.

Andreasen: Thank you everyone for a great discussion.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions, and you've been listening to a sponsored BriefingsDirect podcast. This is just part of a series of four podcasts on the subjects around web data services and BI.

We look forward to future discussions on text analytics, cloud computing, and the role of BI in the future. Thanks for listening, and come back next time.