BigData / Apache Spark
Explain Datasets and DataFrames in Apache spark.
A Dataset is a distributed collection of data. Dataset is a new interface added in Spark 1.6 that provides the benefits of RDDs such as strong typing, ability to use powerful lambda functions. A Dataset can be constructed from JVM objects and then manipulated using functional transformations such as map, filter, etc.
A DataFrame is a Dataset organized into named columns which are conceptually equivalent to a table in a relational database. DataFrames can be constructed from a wide array of sources such as structured data files, tables in Hive, external databases, or existing RDDs.
More Related questions...