How Much Does a Modern Data Warehouse Solution Cost?

What is a modern data solution?
 
What is the difference between a modern data warehouse and a modern data mart?
 
How much does it cost?
 
Good questions.  No easy answers!  But we’ll give it a try and paint a picture.
 
 
What is a modern data solution?  
 
Data solution is a highly variable term by design.  Sometimes clients do not have the budget or timeline to implement an enterprise data warehouse, data lake, and/or data lake-house solution.  Sometimes a smaller, more specific, or faster to implement project is called for.  This is considered by many to be a “modern data solution”.
 
 
What type of solution is my project?
 
        •   How many subject area data sets will be delivered with the solution on Day1?  Subject areas may be defined as the number of application data sets that are being sourced.                  Applications in this sense means the core applications used to run your business (e.g. ERP, CRM, LMS, etc).  
        •   How fully formed will the presentation layer (w/ backend database) be?
        •   Are we talking about an operational environment – perhaps straight queries off of replicated 3N and/or JSON structures fresh off the application?
        •    Do the subject areas require data integration before presentation?  
        •   Will there be data modeling and preprocessing before the data is served in an analytical environment (conformed analytics system) or will the data analysts need to prepare                complex queries and hope everyone will subscribe to the same approach (conformed analytics process)?  
        •   What existing infrastructure is re-purposable to become of this data project?  
        •   What team is in place?  Does there need to be hiring?  For what roles?
        •   Is there top-down buy-in?  Who are the sponsors?  
        •   How will data governance work?  
        •   Is there a roadmap for this project?  For overall data management?
 
 
Determining readiness and preparation
 
It is critical to the success of any data project that most of the questions listed above have specific answers before any implementation begins.  Teams who are unable to confidently respond to these might benefit from an assessment phase , whether internally or through a third-party. During an assessment, you will want to define as many of these variables as possible – sketch out how much data, what team, how much time (when), and how much buy-in you have (who is the executive sponsor not just project sponsor).  This last part is important – the who.  I would estimate the majority of projects that go off track are those that do not have strong, top-down sponsorship.  It is imperative that someone with both budget and staff management leads any data project to ensure adequate resources are available and follow through on completion.  The most frequent commonality among failed data projects is to have multiple sponsors connected by dotted organizational lines with no true ownership at the top.  This situation too often leads to conflicts of interest, lack of responsibility, budget disappearing, and resources pulled onto other priorities midstream. Also, this top-down sponsorship should be able to make tough strategic decisions.  Very few businesses are not data-driven today.  If there is not an organizational realization that this project is central to the organization’s business, then stop and take stock.  This calls for an assessment of goals and a better defined project plan.
 
 
Next steps
 
So, you made it this far.  You have executive, top-down buy-in for your project.  You have answers to more than half of the original questions.  Still, perhaps you do still require an assessment phase.  This could potentially be 1/5 of your project timeline.  Why do it?  A number of reasons.
 
    •   You may not be confident about your team.  Do you have team members to cover all aspects of the desired project?
    •    Do you know all considerations/risks about going after certain subject areas first, second, third?  
    •   Have you assessed which tools and system resources you will use for the project?  If not, have you made lists of three tools for each category to be evaluated?  
    •   Do you know how many subject areas you’d like to bring into the data solution on Day1?  
    •   Do you have modern competencies on staff?  Meaning, there are a number of suggested features and addons to round out the modern data solution including but not limited to          process and system approaches for a data governance layer, CI/CD or continuous integration continuous deployment for the engine that makes it go, aka process and system              approach for DataOps, the offspring of DevOps, born during the wave of digital transformations over the past several years.
 
For a one subject area data solution, the assessment might not take more than a few weeks, drawing from a part time data architect plus part time PM/BA.  You could piggyback some of this onto the sales cycle of your primary software vendor and/or system integrator.   Multiple subject areas might require a month, leveraging a part time solutions architect, a data architect, a data analyst, and a project manager.
 
 
So how do I attach costs to this?
 
Costs will vary widely depending on scope of project and tools selected.  However, there are some aspects of pricing out a data project that are somewhat consistent:
    •   What’s my people cost going to be?  Whether FTEs, consultants, contractors, or a mix of all of the above – this will require multiple months of commitment.  For a single subject          area – to get to a minimalistic, modern data solution – it’s still going to require four months and that’s just to get to a stabilization period.  On average it shouldn’t take more                    than three months to stabilize the data sets from a single application, central to running your business, however the first time around may require more (don’t forget the                          assessment period, unforeseen dependencies, provisioning time).  Your software vendor or system integrator might suggest a pilot project for a period of 2-4 weeks.  Sure                    that’s fine, however just realize that they are cherry picking the data that is easiest to extract, load, and present.  This falls under the umbrella of software activation or proof-of-            concept but not an mvp.
    •   What about tools?  Whether data solution or full blown modern data warehouse/lakehouse, you’ll need to provision modern tools and services, which together makes the                      foundation for the modern data platform.  Here is the grocery list –
        o   Cloud service – the big three are AWS, Azure, and GCP.  
        o   Data warehouse as-a-service and not just commodity storage and/or a database technology.  Examples of this are Snowflake, Redshift, BigQuery, Databricks.  Which one?                     You’ll know after your assessment period (e.g. what will the nature  and mix of your data be – i.e. structure, semi-structured, unstructured).  
        o   ELT/ETL technology.  This is for extract, load and transform of the data set to be ingested/replicated into the data solution.  Sometimes EL is decoupled from T in order to                     save money and push down conventional ETL processing to your (often cheap) cloud data warehouse platform.  
        o   BI solution e.g. Tableau, Power BI, or Looker to name a few.  
        o   Then, there are extras – data catalog tool/service might be your central tool to manage your data governance layer.  Business science anyone?  If you have use cases for                       data  science down the road, you could get a jump with a business science tool to provide some out-of-the-box analytics, which go beyond BI or business intelligence                             reporting and dashboarding.
 
 
Stop the techno babble! What am I into this for?
 
Well, down to brass tacks, I’d have to qualify everything I’m saying here by reminding you that this is just a blog post!  You’ll need to dig a bit deeper to assess costs for your project – and we can certainly help you with that – but just as an example of a data warehouse, here’s a ballpark:  
 
So I’m going to assume 2x subject areas (e.g. a significant Salesforce data set and a significant Workday dataset.  Assume 6x incidental, smaller data sets – smaller topic/extracts only, perhaps a few dozen user defined files and inputs as well.  A project like this would typically involve 1 month of planning/assessment and then 3 months for each major subject area (including the smaller data sets). So, also assume you have a team of three core resources plus a part time project manager.  That’s 3x FTEs for seven months.  For tools, this project would implement a cloud service; a dw service; a replication tool, and a presentation tool (note I’m not including data catalog or business science).  You could do this in less time with more resources – perhaps 3x resources per subject area.  Even if you reduced your timeline, it would take four months minimum.  So implementing longer timeline might be slightly cheaper because you’d ramp up once per 3x resources – of course you’d lose some coverage benefit especially if you are doing this entirely inhouse. But there is certainly some flexibility in how a project like this could be implemented.
 
In order to provide some hard numbers, you need to make some assumptions. For this example, we assumed that the example Salesforce and Workday data sets use out-of-the box ELT connectors.  If custom connectors are required then your Data Engineer will earn their salt and require two additional weeks per.  If your data engineer is inexperienced then custom connectors could take upwards of four to six weeks per custom connector!  Help with business requirements gathering, business analysis, and subject matter expertise will likely require a part-time BA (for each subject area) outside of your team and influence.  This is where a mandate from your top-down sponsor could be helpful.
 
 
So the tab is:
 
For the example I’ve outlined above, assuming the major subject areas are using out of the box ELT tools that included with the applications, a rough cost estimate would be as follows:
 
    •   $25K per (4) tools/services (on average) per year (1) = $100K
    •   3x resources (data architect, data engineer, data analyst, part-time pm) per (7) months = $675K
    •   Total cost = $775K
 
Alternatively, for the data mart solution:
 
    •   $25K per (4) tools/services (on average) per year (1) = $100K
    •   3x resources (data architect, data engineer, data analyst, part-time pm) per (4) months = $400K
    •   Total cost = $500K
 
 
Notes:
 
*I’m not going down the path of explaining number of active row calculations or answering the questions like why is my presentation tool costing more than my ELT tool or why is it expensive when I try to ingest my logging data through an ELT tool etc.  The assumption here is that you are a small to medium sized company and have small to medium sized amounts of operational data!  If you are a large sized company and do not understand any of these comments then seek help immediately!
 
**If you are doing this all inhouse or if you are directing contractors 100%, then you can most likely divide the staffing cost by 2; if you are a large organization and/or using a large consulting partner, then you may need to multiply by 2 to take into account larger data volumes and corporate structure.
 
***At end of project if you have staff in place then you could support the solution inhouse.  If you have not staffed or hired up by end of project you will need a stabilization period requiring additional time of consultant(s) or contractor(s) for this.  
 
****Remember that this includes the purchase of tools that can be used on future projects as well as this one and involves FTEs who will be able to implement future projects once the first project is complete and in maintenance mode.
 
 
Lessons learned
 
Building a data mart can be a good way to test the waters of commitment in your organization with  lower cost, faster implementation, and more limited results.
 
A good assessment is worth its weight in gold in determining the potential outcome of a project.
 
Costs can vary widely based on the types of resources used and the scope of implementation and should be examined closely as part of any assessment.
Newsletter
Get updates about all latest data (& cognitive) analytics news & information delivered to your email, monthly.