If you are tuned in to the latest technology concepts around big data, you’ve likely heard the term “data lake.” The image conjures up a large reservoir of water—and that’s what a data lake is, in concept: a reservoir. Only it’s for data.
Data lake defined
A data lake holds a vast amount of raw, unstructured data in its native format.
[ The essentials from InfoWorld: What is big data analytics? Everything you need to know • What is data mining? How analytics uncovers insights. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Therefore, all you need is a device that supports a flat file system, which means you can use a mainframe if you want. The data is moved to other servers for processing. Most enterprises go with the Hadoop File System (HDFS), because it is designed for fast processing of large data sets and is used in a big data environment where a data lake is likely to be used.
To read this article in full, please click here