Unstructured Enterprise Data: The mozaic of value or maze of confusion?

Simply put, unstructured data is data that does not fit neatly into the records and fields of a database. It often resides in file storage systems and not in a database. This makes it somewhat harder to sort, manipulate and analyze as it does not conform to a set format. While each individual file or message may conform to a  standard (generally defined by the software used to create or house the data –  such as Gmail, Outlook, Excel or Word),  the information contained in the file itself  has no pre-defined format.  The creator of the data and information has free rein to format the data and information as they see fit, but within the overall constraints of the software being used.

  • Examples of unstructured data include Spreadsheets, YouTube videos, emails, text messages, tweets, Word documents, PowerPoint presentations,  internet posts and blogs, data held in Smartphones through the use of Apps and so on.  The volume of unstructured data is expected to continue to expand with no end in sight.

As an individual, and probably without realizing it, you are constantly generating a stream of unstructured data. Each time you send an email, send a tweet, update Facebook, upload a photo from your SmartPhone then save it to your Dropbox folder, you are adding to the rivers of unstructured data.

In the workplace, the ability to create a PowerPoint presentation, Spreadsheet, Word document at will, then store the file somewhere onto the organisation’s IT infrastructure, Cloud or otherwise,  adds to the organisation’s never ending mountain of unstructured data.

Value in unstructured data

By way of example, in most organisations, the trusty Excel spreadsheet is often the weapon of choice for accountants and managers drawing up budgets. Excel is sometimes seen as an example of uncontrolled technology in business, which can have serious consequences if the logic or data is not checked!  In another part of the organisation, dozens of word documents are being created that may contain sensitive commercial information, confidential personal data, or other information of importance.  The sum total of the potential value of data and information contained in these disparate data repositories could over time, become significant.  The challenge facing organisations with significant amounts of invisible, unstructured data is to implement the controls and processes to categorise this information. When engaging third parties, such as contractors and IT consulting firms what unstructured data are they generating and is this of interest to you?

However, identifying, collating and analyzing the contents of the files and various data streams can present a real challenge due to the lack of a common standard with which to collate all the data. This is where specialist IT consulting organisations with proven expertise and technologies in setting up solutions for your organisation to ‘crawl’ all internal (and cloud) information sources and collate a comprehensive taxonomy.  Knowing what’s out there is the first, and probably most important step in this journey.

Additionally, how well you secure this data slum has a direct bearing on the retention of your sensitive or competitive information. It is often this unstructured data that contains the rationale for your executive decision making, organisational strategies and investment plans – much of which may be worth more to others than it may be to you.

Question is:  In your organisation, what information is out there – and in what form – and what value does it represent?

If you don’t know, how will you ensure compliance to any current or proposed mandatory breach reporting legislation, (Australian Privacy Amendment (Privacy Alerts) Bill 2013) ?