Corporate Data Hoarding: Avoid an Intervention with Entity Resolution

Have you seen the TV program “Hoarders”?  It’s a fascinating look into the lives of people who, well, hoard stuff.  A lot of stuff – filling their homes and yards beyond capacity with precarious layers of things that they’ll never use.

Episodes of Hoarders all start at the same point:  The hoarding behavior has completely paralyzed the household and is a health and safety hazard to those in the immediate area.  Local officials step in and lay down an ultimatum:  Clean it up or else. . .

The show then focuses on cleanup of the property and the transformation (or not) of the hoarder.  The hoarder often rationalizes the problem – “it’s not that bad, I just need to make a few tweaks”.  Initial phases of cleanup usually involve close scrutiny of each and every item, and little progress is made.  Eventually, things come to a head and the hoarder either acknowledges that big changes need to be made – or they don’t and face the consequences of losing family and property.

Those who commit to the change are given immediate help to clean up the property.  They are then provided coaching/counseling to address underlying issues and prevent recurrence of the hoarding behavior.

Corporate Data Hoarders

We’re seeing behavior in the corporate world that’s pretty darn similar to what you’d see on Hoarders.  Companies are getting very good at collecting data – most companies’ data stores are doubling every two years.  But less than one half of one percent of that data will ever be used.

In the era of Big Data, executives have been led to believe that their data stores are a digital gold mine; a competitive advantage that can be analyzed to identify opportunities and stay ahead of rivals.  And those executives are right – – there is a lot of value in those piles of data.    But like a hoarder, the time and resources required to search for a particular item often outweighs the value of that item.  These problems compound as the data continues to stack up.

Then comes the ultimatum.  The one event that shines a huge spotlight on the problem and forces executives to come face to face with their data overload issues. Common ultimatum events include:

Ultimatum Event Example
Mergers & Acquisitions M&A decisions rely upon the exchange of verified facts and figures that accurately summarize the state of the business.  A simple question like “How many unique customers do you have?” can take some organizations weeks to answer.  There are 500 more questions. . .
Compliance Requirements & Legislative Changes The EU’s Global Data Protection Regulation (GDPR) applies to all organizations who process data related to citizens of the European Union.  If a citizen asks you to remove their data from your systems – all of your systems – how long would it take?  There are 500 more citizens making this request each day.
Lawsuits & Data Breaches This one’s a double whammy.  The attorneys don’t care how long it takes to get information.  They’re paid by the hour while your team does the discovery grunt work.  Years later, you’ll be featured in a case study highlighting “negligence” and “poor controls”.


Starting the Cleanup Effort:  Entity Resolution

Much like the cleanup effort on Hoarders, corporate data overload projects should focus on identifying the data that’s needed, organizing that data, and making it immediately accessible and useful.  That’s where Entity Resolution comes in.

Entity Resolution is the process of finding records in a data set that refer to the same entity across different data sets.  For example, you may have a customer named “John B. Smith” in your customer database.  You also have a “John Smith” in your supplier database, and an employee: “John Brent Smith”.

Entity Analytics technology looks at these records and determines (based on your criteria) whether the same or separate entities.  The net result is a centralized “dossier” of information on the people, organizations, and things that are associated with your business – and a solid foundation for business analytics.

Entity Resolution Options

Most companies have already developed basic Entity Resolution processes in some form or fashion.  Typically, these consist of match/merge algorithms – or high-maintenance spreadsheets/pivot tables –  developed by IT or Analytics staff.

Unfortunately, these algorithms are often limited to a specific department or data source.  For larger enterprises, the necessity to support and maintain a variety of different match/merge processes only adds to the information overload problems.

Fortunately, there are some off-the-shelf Entity Resolution options available in the marketplace.  Here’s a quick summary of two solutions we’ve implemented for our clients:

IBM Identity Insight Senzing
Operating System(s) AIX, HP-UX, Linux, Solaris, Windows Various Linux Flavors, Windows
Database(s) DB2, Informix, Oracle DB2, SQLite, MySQL, MariaDB, AWS RDS
Hardware Requirements Defined Here Defined Here
API Expanded Service SOAP


C, Python, Java, G2Command
Data Input UMF (XML) .csv, JSON
Event Processing Yes No
Name Recognition Yes Yes
Address Standardization Can be integrated Can be integrated
Relationship Resolution Yes Yes
GDPR Compliance Custom development Native utilities included
ETL Tools Typically required for enterprise applications Visual data mapping utility included.  ETL tools may be required for certain applications.
Other Software Requirements IBM Message Queue (MQ)
Learning Curve Long

Administrators will need strong database backgrounds.  Familiarity with SQL, ETL, and matching logic.


Users and administrators with strong database backgrounds

Implementation Time (Minimum Config) 3-6 Months 2-3 Months
Software Cost (5 Million Records, 4x 4-Core CPUs; At published list price) Approximately $300,000 up front; $60,000 per year for maintenance & support. Approximately $40,000 per year.
Technology Application Typically deployed in large enterprise environments. Enterprise (DB2) and departmental environments (Open Source SQL).



Much like Hoarders, the cleanup effort for organizations suffering from information overload isn’t simple or pleasant.  But it’s necessary.  The pain and costs associated with the status quo are much more significant because they threaten an organization’s ability to identify opportunities, serve customers, and retain employees.

If you’d like to start the clean-up effort, Alpine can help you formulate a plan and do the heavy lifting.   Let’s talk about your challenges and get started today!


0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *