The first wave of enterprise search helped companies tap into the world of text+, sometimes referred to as "unstructured" or "semi-structured" information. Primary drivers included the need to monetize digital content, reduce risk through compliance, or increase employee, customer and partner productivity. These early implementations provided significant value and solved important problems; they also demonstrated limitations that have lead to demand for the next generation -- Unified Information Access (UIA).
This article details the top 5 most important reasons the world's most visionary enterprises are upgrading to UIA.
1. Competitive Advantage
A recent survey of legacy enterprise search deployments for internal use found that the best implementations saved users, on average, more than six hours in productivity per week. Less effective deployments averaged less than one hour of savings. What made the difference? The best implementations focused on providing as much information as possible to the end user through a search interface. Those that provided less benefit were driven by concerns about cost, TCO and numbers of supported file formats.
This should not be surprising. Numerous studies over the past two decades have identified information silos as a major barrier to productivity.
But search has significant limitations; it deals primarily with raw unstructured information -- blocks of text. Many popular examples of 'great search' use carefully prepared content such as that found in content management systems; others focus on office documents usually found on file servers. Incorporating information from other enterprise systems like e-mail, collaboration systems, CRM and ERP systems, etc, can be very challenging -- especially if they offer complex or granular security. Meeting this challenge is the key for those who use productivity as a competitive advantage.
Beyond the impact on end-users, there is the impact on IT itself. While it sometimes can be done, implementing UIA solutions with a search engine and database can be costly! Simply setting up a database and search engine can be difficult; adding business logic to let the application use either repository, let alone both at once, can be complex and difficult. UIA platforms remove a majority of these barriers and streamline the actual interactions with the application; instead of multiple queries against multiple repositories, a single query retrieves all of the information across all of the sources. The only real work is to render the results. UIA reduces the risk and complexity for a single project. As you implement multiple projects the savings is magnified dramatically as the value gained in one project becomes gravy for the next.
Of course productivity is just one advantage of UIA. Being able to analyze information of all types very rapidly unlocks a variety of new doors. For example it may be possible to create new conversion or profit-driving analytics that use unstructured data created by users during service interactions. It may help make better decisions by bringing in and analyzing the opinion (or sentiment) gathered from both internal and external sources. Or it may create new sales opportunities by supporting complex subscription/access models. Any of these can be a major driver of revenue or profits.
2. New Requirements
It has been said many times that the only constant in life is change. In the past decade the average organization has become increasingly lean and distributed. In such an environment sharing information is essential to efficiency. Will there be more distribution in the next decade, or less? If you believe organizations will be more distributed, then clearly IT is going to have to start dealing with new requirements. Old assumptions will become invalid at an increasing rate. For example, it will no longer be a foregone conclusion that solution X will be deployed on the company's hardware or in the company's network.
Cost and desire are pushing companies towards cloud computing models, but this is just the first step. As organizations become more intertwined with strategic partners and customers ,they will need to share data; moreover, they will need to share it across silos.
Imagine the future "real-time 360 degree view" of the customer -- will it be based on the monolithic data warehouse that so many organizations have today? The answer is that it will not. The customer of the future is a 'virtual' one that spans the internal/external boundary and who isn't represented just by transaction, but also text. Legacy technologies are simply not agile enough to manage this data in a timely, distributed fashion, but UIA can. UIA was designed and conceived with these challenges in mind. Information can be of any type, any in most any format. It can be distant or local. The key is that you get a single API to code against, and a single response to each query that can be rendered. Rendering is the fun, quick part of building applications, and for an IT department awash in new requirements and new complexity, it offers a tremendous relief.
3. Licensing/Business Model
Another reason organizations get into UIA is because of the business models typically favored by enterprise software companies, including especially repository vendors. Whether tied to CPUs, servers, virtual machines, document/data or query volume, the vast majority of licensing schemes require you to spend more and more money as you use more and more information.
With the exploding volume of information, is this really a viable situation?
Simply put...no! Tying the cost of information solutions to the amount of data or content they incorporate, or supply, is counter-productive. It forces managers to choose between some hard, quantifiable costs -- the money required for new hardware, for example -- and a more complex benefit such as productivity or better decision making. For some businesses, saving money will be the right choice but for most, the competitive advantages described earlier on are too important to give up on. Companies that leave data and content out of important applications because of cost are simply pushing that cost elsewhere -- probably their increasingly lean workforces. As companies become more and more distributed, this effect will become a negative cycle: bad decisions lead to poor results, which cuts spending for making better decisions.
One solution to this challenge is to look at Free & Open Source Software (FOSS). Many companies have done this...and found an entirely new set of implementation, expertise and complexity issues. For some companies it is a good option that avoids the licensing cost question altogether. For the rest there are leading UIA platforms, the majority of which seem to have moved away from variable pricing models -- perhaps realizing that growth of information, as well as increasing use of virtualization, is just part of the macro-picture everyone must adapt to.
If variable licensing is a big issue for the future of information access, scalability is a huge one. The Economist among others has shown that in the coming year, enterprises will have to manage an ever-increasing "data deluge." In the same way that paying per CPU or document is untenable, spending even a small amount of effort to make a system scale will be ruinous. This does not bode well for the traditional relational database, now at least 40 years old (according to Wikipedia). RDBs are notoriously difficult to scale up, especially for query volume and latency. Eventually you may reach the limit of what can be supported on your hardware/software combination and have...no further way to progress without rethinking things.
Of course this is not a new notion; over the past few years numerous alternatives have appeared, from the nascent NoSQL movement, to columnar databases, to massively parallel models. Many of these require trade-offs of some sort, but they are increasingly popular, particularly for specific applications. The key, though, is that most of these options deal only with traditional "structured" database data.
On the other side of the equation are traditional search engines -- much newer than the relational database, and focused on pure unstructured (text) content. Search engines typically can scale, but ease of use is a question. Many older examples require all the hardware needed for the future, to be available from the very first moment of operation, and adding hardware or moving content around for any reason can be costly and painful.
UIA offers the ability to store all kinds of information -- structured data and unstructured text -- and scales linearly, using the "shared nothing" model. Leading UIA platforms have focused on "ease of scale" through features like "waterfall scalability" wherein servers are loaded with information until they reach some performance limit, then more are added -- seamlessly -- without the need to re-index or move data around.
Over the next few years as we learn the true extent of the "data deluge," being able to scale simply and quickly across any sort of information boundary will be a major advantage.
Most major enterprise search conferences have at least one major session on security. Managers who attend are sometimes frightened to hear about things like "late binding" and "hybrid models" and discussions about security breaches that sometimes occur when documents are being reprocessed to reflect permission changes. These are serious issues that simply don't exist over on the "database" side of the house.
The reason is simple: databases solved the security problem several decades ago -- the "right" (or elegant) way. User and group permissions, or access control lists (ACLs) are stored in tables, the end-users' credentials are added to the query that is ultimately sent to the database, and the results contain only the information the user is authorized to see.
The key to database security is the notion of "tables" and relationships between them. (The "relational" in "relational database" refers to this exact capability.) Enterprise search engines are not databases, however; they are document-centric, not table-centric. They typically store permissions for a document as fields in the document -- along with common fields like "Title" and "Body" and "Author." As with the database, the users' credentials are added to the query and thus the user matches only documents they are authorized to see -- and it works, so long as the security environment isn't changing. However, a simple act like changing permissions on a folder with a few thousand documents on it can create a massive security breach as the documents are fetched, reprocessed with updated ACLs and then indexed. A single large PDF can take 10 seconds to process; if you have a few thousand of those, you are looking at minutes of time before the search index matches up with the permissions.
Search vendors have tried to fix this in several ways, most notably using a separate relational database to store the user and group permissions, running the query against the search index without security, then using the database to filter each result as it goes back to the user. There are many unfortunate side-effects of this model, including poor performance, and potential security leaks through spelling suggestions or facets or navigators. (Such useful capabilities are calculated by the search engine, but can't easily be filtered by the database.)
In the absence of a clear solution, companies often opt to leave secure unstructured information out of internal solutions, never mind trying to provide access to several secure silos at once. Putting unstructured information in a database is equally unappealing. Fortunately, UIA offers the best answer: it models the security data as a database would, keeping documents separate. At query time the users' credentials are used to identify the information they are authorized to see, and this is joined up with the information that matches the query -- including facets and spelling suggestions. There is no need to use a database, accept long security "synchronization issues" or leave secured information out of the index.
In summary, by adopting the UIA approach, companies can rapidly combine and leverage information across the enterprise -- secured, unsecured, internal, external, structure and unstructured -- and all points in between.
Sid Probstein has over 15 years of experience in managing R&D organizations and delivering high-value enterprise software and solutions. Prior to his role as CTO at Attivio, he was VP of Technology in the Office of the CTO at Fast Search & Transfer and prior to that was VP of Engineering at Northern Light Technology. He also served as Director of Software Engineering at Freemark Communications, and was the youngest person to hold the title of System Manager at John Hancock Financial Services.