//MarTech Landscape: What's the Difference Between a Data Warehouse and a Data Lake?

MarTech Landscape: What's the Difference Between a Data Warehouse and a Data Lake?



It may seem odd to ask a marketer if he wants his data contained in something metaphorically described as a building or a body of water.

In this article, which is part of our MarTech Landscape Series we examine the characteristics of these two types of massive data storage.

Data Warehouses

Digital marketers are working more and more with Big Data, huge amounts of raw information from social media, contact centers, online behavior tracking, and other sources. . And two of the most common types of storage for large amounts of data are "data warehouses" and "data lakes."

Although marketers obviously associate IT with storage decisions, it's helpful to understand the capabilities and costs of your systems by understanding the data storage used.

A data warehouse is used to store generally structured data for databases at the time of entry. Data often comes from operational systems – transactions, customer records, human resources, customer relationship management systems, enterprise resource planning systems, and so on. The data is usually carefully sorted and prepared before being stored in a warehouse, which is often the preferred mechanism if the information is legally binding and must be traceable.

A warehouse can store unstructured data, such as camera images on the body from police officers, said James D'Arezzo, CEO of storage performance provider Condusiv Technologis. Although this type of data is usually not structured for a database, it can be a list of files. But, like the physical structures they are named after, data warehouses are primarily designed to store data that is correctly sorted, filtered, and packaged upon entry.

Data Lakes

As their name indicates, data lakes are more amorphous than warehouses. They store all kinds of data from all sources, including video streams, audio streams, facial recognition data, social media posts, and the like.

Lakes sometimes use artificial intelligence to characterize incoming data, such as their naming, but formatting, processing, and data management are typically done when they are exported for a given need, and not before being stored. Although warehouses are generally much more discriminating about the types of data they allow to enter, the lakes accept almost everything.

Although lakes are not necessarily faster to accept or process data, D'Arezzo tells me, their data managers were not forced to create new structures and criteria for accepting data. For a marketer, he added, lakes translate into greater depth and more data sources than in a warehouse.

Why does this matter to marketers

Data management systems can use both warehouses and lakes, or they can focus on one type or another. Dr. Arezzo recommends marketers to understand the type of storage where their data resides, the available analysis tools, the integration with systems capable of acting on the data, the costs , performance problems and whether storage resides in the premises of the company the public cloud, in the private cloud of the company or a combination of both.

In terms of cost, pre-storage data preparation for a warehouse can be expensive and time consuming, and warehouses traditionally store their huge amounts of data on slow, inexpensive magnetic tapes, while lakes use often commodity disks.

D'Arezzo also notes that sometimes marketers do not really know what they want to do with data before they are stored. It may therefore be difficult or difficult to prepare them for an unknown purpose. Facial recognition data, social publications or data from devices of the Internet of Things, he said, may fall into this category, in which it might be better to store first and decide later.

Warehouse providers include IBM, Google, Microsoft, Teradata, SAP, while some lake vendors are AWS, Microsoft, Informatica, and Teradata.

This story was first published on MarTech Today. For more information on marketing technology, click here.

About the author

Barry Levine covers marketing technology for Third Door Media. Previously, he covered this space as a senior editor for VentureBeat, and he wrote on these technical topics, among others, for publications such as CMSWire and NewsFactor. He founded and managed the website / unit of PBS Thirteen / WNET; worked as a Senior Producer / Writer Online for Viacom; created a successful interactive game, PLAY IT BY EAR: The first CD game; founded and directed an independent film, CENTER SCREEN, based at Harvard and M.I.T .; and served for five years as a consultant to the M.I.T. Media Lab. You can find it on LinkedIn and Twitter on xBarryLevine.