Automating Data Analysis Is a Must for Midsize Businesses

Midsize company leaders are right to be excited about the opportunities for harnessing the value in their large datasets. But the data in midsize companies tends to be messy — spreadsheets and plain-text files, many in different formats, are difficult (if not impossible) to integrate. It takes a lot of time and money to clean it up to make it useful. Poor-quality, disintegrated data can sabotage even the best initiatives, including AI designed to increase value and efficiency. HdL Companies, a Brea, California–headquartered government services firm, used their data strategically and has seen significant efficiency gains. The author offers three lessons for leaders to consider when getting started with automating data analysis.

As midsize companies grow, they develop data flows and data lakes (repositories for both structured and unstructured data) that are too big for one person, or even a team, to manipulate and use effectively. And even if a company is currently deriving value from its data, the people doing the work might move on, leaving the business tasked with having to find, attract, and hire expensive data analysts in a hurry.

Having a capable, up-to-date enterprise resource planning system (ERP) won’t solve the problem or relieve the pressure. Most midsize companies begin with finance-focused ERPs and wind up bolting on systems to store other data, such as customer activity and manufacturing throughput — a move that’s more operational than strategic.

Consequently, automating data analysis as the business grows is a very, very good idea. Automation is often where programmers write algorithms that perform previously manual tasks as instructed. Doing so pays dividends quickly, drives innovation and more growth, and paves the way to implementing artificial intelligence, which makes just about everything easier and more efficient and cost-effective. AI is coded to learn to perform a task, in some sense inventing and writing its own algorithms.

But the data in midsize companies tends to be messy. Spreadsheets and plain-text files, many in different formats, are difficult if not impossible to integrate. It takes a lot of time and money to clean them up to make them useful. Poor-quality, disintegrated data can sabotage even the best initiatives, including AI designed to increase value and efficiency.

As Joe Pucciarelli, group VP and IT executive advisor at the market research company International Data Corporation (IDC), said in a recent Channel Company webinar, “Most organizations’ data sets are not in great condition. We talk about data and analytics as a strategy and priority, but the data isn’t ready to support it.…Most organizations, when they’re trying to solve a problem, the analyst who’s working on it typically spends 75%+ of the time…simply preparing the data.” 

As you might imagine, the ROI on the time spent doing that is not good. Let’s look at how one midsize company harnessed the value in its data and explore three steps midsize business leaders can take to do the same.

How One Midsize Company Dealt with Its Data

One of my clients, HdL Companies — a government services firm headquartered in Brea, California — is engaged by municipalities in California, Texas, and other states to analyze their respective states’ distribution of sales tax revenue to ensure that their city or town is getting its fair share. HdL looks for misallocations and discrepancies that municipalities can point to when petitioning the state for redress. The heart of this work is comparing different databases to expose discrepancies that affect who should get sales tax revenues. For example, in one database a business might be listed in Dublin, CA, but in two other databases it could be listed in neighboring Pleasanton. That makes a tax-allocation error highly likely; HdL’s job is to ferret it out.

California’s 40 million residents buy taxable products from 5.9 million licensed resellers, creating a massive data set of nearly 46 million tax records in 2020. For years HdL employed analysts to pore through such data every quarter, looking for mistakes. HdL’s IT group created software to help, but over the years its analytics team adopted many idiosyncratic manual techniques, and the IT group had a long backlog of work to keep building the code base to include those techniques. Coping with the backlog was delaying HdL’s automation projects and the development of new techniques to surface tax discrepancies more efficiently. At the same time, the state of California was making its own improvements, leaving fewer discrepancies that could be found using HdL’s old tools. “Our team is always finding new analytical techniques to identify hard-to-find misallocations,” says Matt Hinderliter, director of audit services at HdL. “However, we have been heavily reliant on manual exports and manipulation of data in Excel as well as the need to have senior-level analysts manually review spreadsheets that often exceed 70k or 80k rows of data.”

To deal with both the external (California’s improvements) and internal (HdL’s overloaded IT department and laborious manual analysis) stressors, HdL — a midsize company with a midsize budget — hired a talented intern who was earning her master’s degree in data analytics full-time. She was able to turn some of the analytical processes team members used to identify potential misallocations into algorithms that could generate more tax revenue reallocation opportunities in a fraction of the time.

Given this efficiency gain, one might assume that HdL would be considering layoffs. Instead, its audit department is staffing up to pursue all the opportunities the automated analysis has surfaced. And HdL has moved closer to focusing on implementing and deploying AI.

Improving operational efficiency is almost always a top priority for midsize companies. In a Channel Company survey of middle-market IT leaders, 75% of whose firms have $50M to $1B in revenue, 58% of respondents said their top priority was improving operational efficiency. That far exceeded their second priority, increasing new revenue (36%). Both goals can be supported by automating data analytics, as they were at HdL.

Getting Started

Midsize companies can’t tackle every opportunity. Their budgets and workforces and the hurly-burly of day-to-day operations won’t allow it. (They’re not Google, after all.) So midsize companies should begin automating their data analysis processes by focusing on areas where critical operations are either inefficient or too dependent on one person or a handful of people. Before automating, HdL had 15 people spending a significant chunk of their time doing what algorithms are doing today.

HdL was already doing data work; many businesses — printers, plumbing suppliers, and so on — are not. But those companies are still amassing data, and they can benefit by using it strategically. It’s important to begin with a solid foundation. Here are three things for leaders to consider when starting to automate data analysis.

Prioritize cleanup. Data in a midsize business is typically messy and needs a lot of tidying before it can become useful. Another foundational activity is identifying which data is important and then scrubbing it. This can be slow work at first, and it’s not inexpensive, so find areas where the business can retrieve a payback in the first year. That will turn skeptics into believers.

Hire the right people. Executives are not analysts. They lack the time, patience, and skills to do data analysis as an add-on to their everyday duties. Business analysts are part programmer, part businessperson. HdL started with an intern and hired her as a full-time business analyst.

Prepare the data. Only when your data is thoroughly prepared can you start thinking about AI. AI creates its own logic from an analysis of the patterns it discovers in the data. Although AI and machine learning are useful and exciting, both technologies need large datasets upon which to train, with confirmed positive and negative outcomes. After enough data cleansing and a few algorithm-based sweeps, most midsize companies will have a sufficiently large and useful dataset on which to train an AI model.

Midsize company leaders are right to be excited about the opportunities to harness the value in large datasets. Now is the time to get started on this multiyear journey and commit to hiring the right talent while taking incremental steps to produce value from data automation and other types of advanced analytics.