Related Posts
Which is good to join..Publicis sapient or LTI?
Can I join in Nokia R&D unit for java, spring boot backend developer role considering current situation of layoffs in product based companies ?
Exp - 4 years
Tech stack - Java, Spring Boot, Microservices
EPAM Systems Cisco Nokia Dell Deloitte Deloitte USI Deloitte India Infosys Cognizant KPMG EY PwC Verizon Verizon Media Ericsson Huawei Technologies
What's your commute time to work?
Additional Posts in Data & Analytics Consultants
New to Fishbowl?
unlock all discussions on Fishbowl.



A storage location containing files in various formats and structures of data on cheap hardware that you put schema on read software to scan and report on the data.
Data stored in formats like xml, json, csv, images formats, proprietary formats as well.
Any data you like, not only what you'd store in a DWH, but variably structured (log files in csv) as well.
Basically if you can store the data in file storage in a file format you can store it in a datalake.
Raw data, in whatever form, is your water. It flows into the data lake. The data lake, unfortunately, has a bunch of water from farms with pesticides and fertilizer, fish shit, random plants, and a bunch of other junk that means you can't consume it yet. But it's a place the water can sit until it flows into your water treatment plants and from there to places where it can actually be used and drank and etc.
“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” That analogy helped me. Source - https://www.forbes.com/sites/bernardmarr/2018/08/27/what-is-a-data-lake-a-super-simple-explanation-for-anyone/
Data Lake = Object storage that stores structured and unstructured data in its raw state + Data catalog to index this data + (optionally) querying capabilities + Security and Access control
One combination - Amazon S3 + AWS Glue + AWS Athena
Another combination - HDFS + Spark
There are also managed Data Lake such as AWS Lake Formation and Delta Lake.
Data lake is where you dump all your raw data, of all forms. Landing zone, if you will. You can't use a DB as an LZ since it has limitations of schema or file types or whatever.
+1 to everything others have posted. Also, your compute is decoupled from storage, giving you more flexibility in choice of technology and spend. This is in contrast to traditional relational DBs where your compute and storage both depend on the host (server), unless you invest in a shared or replicated architecture, which is costly in itself.