Apache Hive is a data warehouse infrastructure tool built on top of Hadoop for providing data summarization, query, and analysis. It provides an SQL-like interface called HiveQL (HQL) to query data stored in various databases and file systems.

What is the difference between managed and external tables in Hive?

For managed tables, Hive controls the lifecycle of the data; when you drop a managed table, Hive deletes the data. For external tables, Hive only manages the metadata; when you drop an external table, the underlying data remains untouched.

What are partitioning and bucketing in Hive?

Partitioning divides a table into parts based on the values of one or more columns (e.g., by date), creating separate directories for faster queries on those columns. Bucketing decomposes data into more manageable files within each partition based on a hash function of a column, which can improve join performance.

What is a SerDe in Hive?

SerDe stands for Serializer/Deserializer. It's an interface that tells Hive how to read data from a file (deserialize) and write data to a file (serialize). It's crucial for handling various data formats.

How do you optimize Hive queries?

Common optimization techniques include using partitioning and bucketing, choosing efficient file formats like ORC or Parquet, enabling vectorization, using map joins for small tables, and leveraging the Tez or Spark execution engines.

What is the difference between ORDER BY, SORT BY, and CLUSTER BY?

ORDER BY performs a global sort, which is slow as it uses a single reducer. SORT BY sorts data only within each reducer, which is faster but doesn't guarantee global order. CLUSTER BY is a combination of DISTRIBUTE BY and SORT BY on the same columns.

What are the best file formats to use in Hive?

Columnar formats like ORC (Optimized Row Columnar) and Parquet are highly recommended. They offer excellent compression and performance benefits due to features like predicate pushdown, which allows Hive to read only the necessary columns and blocks of data.

Top 100 Hive Interview Questions & Answers (2025)

Q: What is the difference between Hive and a traditional RDBMS?

Hive is designed for batch processing on large datasets and uses a 'schema-on-read' approach, making it flexible for unstructured data. Traditional RDBMS are designed for transactional processing (OLTP) with low latency and use a 'schema-on-write' approach, enforcing a strict schema before data is loaded.

Q: What is the difference between Hive and Spark SQL?

Hive typically uses MapReduce or Tez for batch processing and has higher latency. Spark SQL uses Spark's in-memory processing engine, making it much faster and suitable for both batch and interactive queries.

Q: What is a map join in Hive?

A map join is an optimization where a small table is loaded into the memory of each mapper, allowing the join to be performed entirely within the map phase without a reduce phase. This significantly improves performance for joins involving one small and one large table.

This guide provides a curated list of common Apache Hive interview questions to help you prepare for your next Data Engineer or Big Data Developer role. Master these concepts to demonstrate your expertise in data warehousing and large-scale data analysis.

Last Updated: Aug 30, 2025

Core Concepts & Architecture (1-10)
Data Model: Tables, Partitions & Buckets (11-25)
HiveQL: DDL, DML & Functions (26-45)
Data Storage & SerDe (46-55)
Performance Tuning & Optimization (56-75)
Security & Integration (76-85)
Advanced Topics & Ecosystem (86-100)

Core Concepts & Architecture

1. What is Apache Hive?

Apache Hive is a data warehouse system built on top of Apache Hadoop for providing data query and analysis. It gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Hive converts SQL-like queries into MapReduce or Tez jobs. You can learn more from the official Apache Hive website.

2. What is the difference between Hive and a traditional RDBMS?

Schema: Hive uses schema-on-read, meaning the schema is applied to the data when it's read. RDBMS uses schema-on-write, where the schema is enforced when data is written.
Workload: Hive is designed for batch processing and large-scale analytics (OLAP), not for real-time transactional queries (OLTP).
Updates: Hive does not support row-level updates and deletes in the same way as an RDBMS (though it has some support via ACID transactions).
Data Storage: Hive stores data in a distributed file system like HDFS, while RDBMS typically uses its own optimized storage format.

A diagram of the Apache Hive architecture showing the Metastore, Driver, and Execution Engine.

3. What are the main components of the Hive architecture?

The main components are:

Metastore: Stores the metadata for Hive tables, such as their schema and location.
Driver: Manages the lifecycle of a HiveQL query as it moves through Hive.
Compiler: Compiles the HiveQL query into a directed acyclic graph (DAG) of MapReduce or Tez jobs.
Optimizer: Performs various transformations on the execution plan to optimize it.
Execution Engine: Executes the compiled jobs in the correct order on Hadoop.

4. What is the Hive Metastore?

The Hive Metastore is a central repository that stores all the metadata about Hive tables, partitions, schemas, and locations. It can be configured in different modes (embedded, local, or remote) and typically uses a relational database like MySQL or PostgreSQL for storage.

5. What are the different modes of Hive Metastore?

Embedded Mode: Metastore runs in the same JVM as Hive and uses an embedded Derby database (suitable for testing only).
Local Mode: Metastore runs in the same host as Hive but uses a separate database like MySQL.
Remote Mode: Metastore runs on a separate server and can be accessed by multiple Hive instances.

6. What is HiveQL?

HiveQL (HQL) is Hive's query language similar to SQL. It allows users to write queries that are similar to SQL but are converted into MapReduce, Tez, or Spark jobs for execution on Hadoop.

7. What are the different execution engines supported by Hive?

MapReduce: The original execution engine (slowest)
Tez: More efficient than MapReduce, uses directed acyclic graphs (DAGs)
Spark: Uses Apache Spark for in-memory processing (fastest)

8. What is the difference between Hive and Pig?

Hive: Uses SQL-like language (HQL), better for structured data and users familiar with SQL
Pig: Uses Pig Latin scripting language, better for complex data pipelines and ETL processes

9. What is the difference between Hive and Impala?

Hive: Batch processing, higher latency, more fault-tolerant
Impala: Interactive queries, lower latency, less fault-tolerant

10. What are the limitations of Hive?

High latency (not suitable for real-time queries)
Limited support for row-level updates and deletes
Not designed for OLTP workloads
Sub-optimal for small datasets

Data Model: Tables, Partitions & Buckets

11. What are the different types of tables in Hive?

Managed (Internal) Tables: Hive manages both the metadata and the data. Dropping the table deletes the data.
External Tables: Hive only manages the metadata. Dropping the table doesn't delete the underlying data.

12. When would you use an external table vs a managed table?

Use external tables when data is used by multiple tools or needs to persist after table deletion
Use managed tables when Hive should have full control over the data lifecycle

13. What is partitioning in Hive?

Partitioning is a method to divide a table into related parts based on the values of particular columns (like date, country, etc.). Each partition corresponds to a specific value of the partition column and is stored as a subdirectory in the table's directory.

14. What are the benefits of partitioning?

Faster query execution (partition pruning)
Better data organization
Easier data management
Improved performance for range queries

15. What is bucketing in Hive?

Bucketing is a technique that decomposes data into more manageable parts (buckets) based on the hash function of a column in the table. Each bucket is stored as a file in the partition directory.

16. What are the benefits of bucketing?

Faster joins (map-side joins)
Efficient sampling
Better performance for certain queries
More balanced data distribution

17. What is the difference between partitioning and bucketing?

Partitioning: Creates directories based on column values, good for filtering
Bucketing: Creates files based on hash values, good for joins and sampling

18. What is dynamic partitioning in Hive?

Dynamic partitioning allows Hive to automatically create partitions based on the values of the partition columns in the inserted data. You don't need to specify each partition manually.

19. What is static partitioning in Hive?

Static partitioning requires you to explicitly specify the partition values when loading data into the table. You need to know the partition values in advance.

20. How do you enable dynamic partitioning in Hive?

Set the following properties:

SET hive.exec.dynamic.partition = true;

                SET hive.exec.dynamic.partition.mode = nonstrict;

21. What is a view in Hive?

A view is a virtual table based on the result set of a HiveQL query. It doesn't store data itself but provides a way to simplify complex queries or restrict access to certain data.

22. What is an index in Hive?

An index in Hive is a data structure that improves the speed of data retrieval operations on a table. Hive supports compact and bitmap indexes.

23. What are the different types of indexes in Hive?

Compact Index: Stores the HDFS block IDs for each indexed value
Bitmap Index: Uses bitmap vectors for columns with low cardinality

24. What is a transactional table in Hive?

Transactional tables (ACID tables) support atomicity, consistency, isolation, and durability. They allow operations like INSERT, UPDATE, and DELETE at row level.

25. How do you create a transactional table in Hive?

Use the TBLPROPERTIES clause:

CREATE TABLE table_name (...) 

                TBLPROPERTIES ('transactional'='true');

HiveQL: DDL, DML & Functions

26. What are the different data types supported by Hive?

Primitive: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING, TIMESTAMP, etc.
Complex: ARRAY, MAP, STRUCT, UNION

27. How do you load data into a Hive table?

Using the LOAD DATA statement:

LOAD DATA [LOCAL] INPATH 'file_path' [OVERWRITE] INTO TABLE table_name [PARTITION (part_col=val)];

28. What is the difference between LOAD DATA and INSERT in Hive?

LOAD DATA: Moves or copies files into Hive's warehouse directory (faster)
INSERT: Processes data through MapReduce/Tez/Spark (slower but more flexible)

29. How do you insert data into a Hive table?

Using INSERT statements:

INSERT INTO TABLE table_name [PARTITION (part_col=val)] SELECT ...;

                INSERT OVERWRITE TABLE table_name [PARTITION (part_col=val)] SELECT ...;

30. What is the difference between INSERT INTO and INSERT OVERWRITE?

INSERT INTO: Appends data to the table
INSERT OVERWRITE: Replaces the existing data in the table or partition

31. How do you update data in a Hive table?

For transactional tables, use the UPDATE statement:

UPDATE table_name SET column = value [WHERE condition];

32. How do you delete data from a Hive table?

For transactional tables, use the DELETE statement:

DELETE FROM table_name [WHERE condition];

33. What are the different types of joins in Hive?

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
CROSS JOIN (Cartesian product)

34. What is a map join in Hive?

A map join is a type of join where a small table is loaded into memory and joined with the larger table without requiring a reduce phase. This can significantly improve performance.

35. How do you enable a map join in Hive?

Set the following property or use the MAPJOIN hint:

SET hive.auto.convert.join = true;

                SELECT /*+ MAPJOIN(small_table) */ ... FROM large_table JOIN small_table ...;

36. What are user-defined functions (UDFs) in Hive?

UDFs allow users to extend HiveQL functionality by writing custom functions in Java. There are three types:

UDF: Regular function (one row in, one row out)
UDAF: Aggregate function (multiple rows in, one row out)
UDTF: Table-generating function (one row in, multiple rows out)

37. What are the different types of built-in functions in Hive?

Mathematical functions (ROUND, CEIL, FLOOR, etc.)
String functions (CONCAT, SUBSTR, UPPER, LOWER, etc.)
Date functions (YEAR, MONTH, DAY, DATEDIFF, etc.)
Conditional functions (CASE, COALESCE, etc.)
Aggregate functions (SUM, COUNT, AVG, MAX, MIN, etc.)

38. What is the ORDER BY clause in Hive?

ORDER BY performs a global sort of all data using a single reducer. It should be used with LIMIT to avoid performance issues with large datasets.

39. What is the SORT BY clause in Hive?

SORT BY sorts data within each reducer. It doesn't guarantee a global order but is more efficient than ORDER BY for large datasets.

40. What is the DISTRIBUTE BY clause in Hive?

DISTRIBUTE BY controls how map output is distributed to reducers. It ensures that all rows with the same values for the specified columns go to the same reducer.

41. What is the CLUSTER BY clause in Hive?

CLUSTER BY is a combination of DISTRIBUTE BY and SORT BY using the same columns. It distributes and sorts the data on the same set of columns.

42. How do you handle NULL values in Hive?

Hive treats NULL values as special values. You can use functions like:

IS NULL / IS NOT NULL
COALESCE (returns first non-NULL value)
NVL (replaces NULL with specified value)

43. What is the difference between LIKE and RLIKE in Hive?

LIKE: Simple pattern matching with % and _ wildcards
RLIKE: Regular expression matching using Java regex patterns

44. What are lateral views in Hive?

Lateral views are used in conjunction with user-defined table generating functions (UDTFs) to explode complex data types like arrays or maps into multiple rows.

45. How do you use EXPLAIN in Hive?

The EXPLAIN command shows the execution plan for a query, which helps in understanding how Hive will execute it and identifying potential optimizations.

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query;

Data Storage & SerDe

46. What is SerDe in Hive?

SerDe stands for Serializer/Deserializer. It's an interface that tells Hive how to process a record. The Serializer converts Java objects to a format that Hive can write to HDFS, and the Deserializer converts the stored format back to Java objects.

47. What are the different file formats supported by Hive?

TextFile (default)
SequenceFile
RCFile (Record Columnar File)
ORC (Optimized Row Columnar)
Parquet
Avro

48. What is the difference between ORC and Parquet?

ORC: Hive-native format, better compression, supports ACID transactions
Parquet: More general-purpose, better for nested data, widely supported across Hadoop ecosystem

49. What are the advantages of using ORC format?

Better compression
Predicate pushdown
Faster queries
Support for ACID transactions
Lightweight indexes

50. What are the advantages of using Parquet format?

Columnar storage
Efficient compression
Predicate pushdown
Good support for nested data
Wide ecosystem support

51. What is compression in Hive and why is it important?

Compression reduces the size of data stored in HDFS, which saves storage space and improves query performance by reducing I/O. Common compression codecs include Snappy, Gzip, LZO, and Bzip2.

52. How do you enable compression in Hive?

Set the following properties:

SET hive.exec.compress.output = true;

                SET mapreduce.output.fileoutputformat.compress = true;

                SET mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.SnappyCodec;

53. What is the difference between Snappy and Gzip compression?

Snappy: Faster compression/decompression, lower compression ratio
Gzip: Slower compression/decompression, higher compression ratio

54. What is a custom SerDe in Hive?

A custom SerDe allows you to define how Hive should serialize and deserialize data for specific formats not natively supported. You can write your own SerDe by implementing the SerDe interface.

55. How do you specify a file format when creating a table?

Use the STORED AS clause:

CREATE TABLE table_name (...) 

                STORED AS ORC; -- or PARQUET, SEQUENCEFILE, etc.

Performance Tuning & Optimization

56. What are some common techniques to optimize Hive queries?

Partitioning and bucketing
Using appropriate file formats (ORC, Parquet)
Enabling compression
Using vectorization
Enabling Tez or Spark execution engine
Using map joins for small tables

57. What is vectorization in Hive?

Vectorization allows Hive to process a batch of rows together instead of one row at a time, reducing CPU usage and improving performance. Enable it with:

SET hive.vectorized.execution.enabled = true;

58. What is predicate pushdown in Hive?

Predicate pushdown is an optimization technique where filtering is pushed down to the storage layer, reducing the amount of data that needs to be read and processed.

59. How do you control the number of mappers and reducers in Hive?

For mappers, it's controlled by the input format and file splitting. For reducers, use:

SET mapreduce.job.reduces = number;

Or let Hive automatically determine the number:

SET hive.exec.reducers.auto.enabled = true;

60. What is the significance of the hive.fetch.task.conversion property?

This property controls whether certain queries can be executed without MapReduce. When set to "more", simple queries like SELECT with LIMIT or filtering on partition columns can run faster without launching a MapReduce job.

61. How do you handle data skew in Hive?

Use the SKEWED BY clause when creating tables
Enable skew join optimization
Use multiple reducers
Manually handle skewed keys in queries

62. What is the purpose of the hive.optimize.skewjoin property?

When set to true, this property enables Hive to handle skew in join operations by creating separate jobs for skewed keys.

63. How do you optimize joins in Hive?

Use map joins for small tables
Place larger table last in the join order
Use bucketing on join keys
Enable parallel execution

64. What is the purpose of the hive.auto.convert.join property?

When set to true, this property allows Hive to automatically convert joins to map joins when appropriate, based on the size of the tables.

65. How do you optimize GROUP BY operations in Hive?

Use the hive.map.aggr property to do partial aggregation in the map phase
Use the hive.groupby.skewindata property to handle skew in grouping keys
Ensure proper partitioning and bucketing

66. What is the purpose of the hive.groupby.skewindata property?

When set to true, this property enables Hive to handle skew in GROUP BY operations by launching an additional MapReduce job to distribute the data more evenly.

67. How do you optimize ORDER BY operations in Hive?

Use LIMIT to reduce the amount of data sorted
Consider using SORT BY instead if global ordering isn't required
Increase the number of reducers for better parallelism

68. What is the purpose of the hive.exec.parallel property?

When set to true, this property allows Hive to execute independent stages of a query in parallel, which can improve performance for complex queries.

69. How do you handle small files in Hive?

Use the hive.merge.mapfiles and hive.merge.mapredfiles properties to merge small files
Use the hive.merge.size.per.task and hive.merge.smallfiles.avgsize properties to control merge thresholds
Use appropriate partitioning and bucketing to avoid creating many small files

70. What is the purpose of the hive.exec.dynamic.partition.mode property?

This property controls the mode for dynamic partitioning:

strict: Requires at least one static partition
nonstrict: Allows fully dynamic partitioning

71. How do you monitor and analyze query performance in Hive?

Use the EXPLAIN command to see the execution plan
Check the Hive logs for detailed information
Use Hadoop job tracker to monitor MapReduce jobs
Use performance counters and metrics

72. What is the cost-based optimizer (CBO) in Hive?

The Cost-Based Optimizer uses table and column statistics to generate more efficient execution plans. Enable it with:

SET hive.cbo.enable = true;

73. How do you collect statistics in Hive?

Use the ANALYZE TABLE command:

ANALYZE TABLE table_name [PARTITION(part_col=val)] COMPUTE STATISTICS [FOR COLUMNS];

74. What is the LLAP (Live Long and Process) feature in Hive?

LLAP is a daemon that runs on worker nodes to provide caching, pre-fetching, and query fragment execution. It enables faster interactive queries by keeping data in memory.

75. How do you enable Tez as the execution engine in Hive?

Set the following property:

SET hive.execution.engine = tez;

Security & Integration

76. What are the different authentication methods in Hive?

LDAP authentication
Kerberos authentication
Custom authentication plugins

77. What are the different authorization methods in Hive?

Storage-based authorization (HDFS permissions)
SQL standards-based authorization
Apache Ranger integration
Apache Sentry integration

78. How do you enable security in Hive?

Set the following properties:

SET hive.security.authorization.enabled = true;

                SET hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory;

79. What is the difference between Hive and HBase?

Hive: Data warehouse for batch processing, SQL-like queries
HBase: NoSQL database for real-time read/write access, key-value storage

80. How do you integrate Hive with HBase?

Create an external Hive table that maps to an HBase table:

CREATE EXTERNAL TABLE hive_table(key string, value string)

                STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

                WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:val")

                TBLPROPERTIES ("hbase.table.name" = "hbase_table");

81. How do you integrate Hive with Amazon S3?

Use the s3a:// protocol in the table location:

CREATE EXTERNAL TABLE table_name (...) 

                LOCATION 's3a://bucket-name/path/';

82. How do you integrate Hive with JDBC-compliant databases?

Use the StorageHandler for the specific database or create an external table using the JDBC storage handler.

83. What is the Hive metastore service?

The Hive metastore service is a Thrift service that provides metadata access to Hive and other tools. It can be run in embedded, local, or remote mode.

84. How do you connect to Hive from other applications?

Use the Hive JDBC or ODBC drivers to connect from applications like Tableau, Excel, or custom Java applications.

85. What is the Hive Web Interface (HWI)?

The Hive Web Interface is a simple web-based GUI for running Hive queries and managing Hive tables. It's being deprecated in favor of more modern tools like Hue and Apache Zeppelin.

Advanced Topics & Ecosystem

86. What is Apache Tez?

Apache Tez is a framework for building high-performance batch and interactive data processing applications on Hadoop. It's designed as a more efficient replacement for MapReduce and is the default execution engine for Hive in many distributions.

87. What is the difference between Hive on MapReduce and Hive on Tez?

MapReduce: Slower, more disk I/O, separate jobs for each stage
Tez: Faster, less disk I/O, optimized execution plan with DAG

88. What is HiveServer2?

HiveServer2 is an improved version of HiveServer that supports multi-client concurrency and authentication. It provides a Thrift interface for clients to execute queries against Hive.

89. What is the difference between HiveServer and HiveServer2?

HiveServer: Single-threaded, no authentication, deprecated
HiveServer2: Multi-threaded, supports authentication, better security

90. What is Beeline in Hive?

Beeline is a command-line shell for HiveServer2 that uses JDBC to connect. It replaces the older Hive CLI (command line interface).

91. What is Apache Hue?

Apache Hue is a web interface for analyzing data with Apache Hadoop. It provides a SQL editor for Hive, making it easier to write and execute queries through a browser.

92. What is Apache Zeppelin?

Apache Zeppelin is a web-based notebook that enables interactive data analytics. It supports Hive and many other data processing systems through interpreter plugins.

93. What is the difference between Hive and Spark SQL?

Hive: Batch processing, uses MapReduce/Tez, higher latency
Spark SQL: In-memory processing, lower latency, better for iterative algorithms

94. What is Hive on Spark?

Hive on Spark allows Hive to use Apache Spark as its execution engine instead of MapReduce or Tez. This combines Hive's mature SQL capabilities with Spark's in-memory processing performance.

95. What is the future of Hive in the big data ecosystem?

Hive continues to evolve with features like LLAP for interactive queries, improved ACID support, and better integration with other technologies like Spark and cloud storage. While newer technologies like Spark SQL are gaining popularity, Hive remains important for batch processing and as a metastore service.

96. What are Hive streaming APIs?

Hive streaming APIs allow continuous ingestion of data into Hive tables. They are particularly useful for streaming data from systems like Kafka into Hive for near-real-time analytics.

97. What is the Hive ACID feature?

Hive ACID (Atomicity, Consistency, Isolation, Durability) provides transactional capabilities, allowing operations like INSERT, UPDATE, and DELETE at row level. This is implemented using ORC file format and a transaction manager.

98. How do you enable ACID transactions in Hive?

Set the following properties and use transactional tables:

SET hive.support.concurrency = true;

                SET hive.enforce.bucketing = true;

                SET hive.exec.dynamic.partition.mode = nonstrict;

                SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

                SET hive.compactor.initiator.on = true;

                SET hive.compactor.worker.threads = 1;

99. What is the purpose of the Hive metastore database?

The Hive metastore database (usually MySQL or PostgreSQL) stores metadata about Hive tables, such as their schema, location, partitioning information, and statistics. This allows multiple Hive instances to share the same metadata.

100. How do you backup and restore Hive metadata?

To backup Hive metadata, you need to backup the metastore database. The process depends on the database used (e.g., mysqldump for MySQL). To restore, you would restore the database backup and ensure the HDFS data locations are intact.

About the Author

This guide is curated by the team at DevUtilityTool, who are passionate about creating high-quality, accessible resources that help developers excel in their careers.

Table of Contents

Core Concepts & Architecture

1. What is Apache Hive?

2. What is the difference between Hive and a traditional RDBMS?

3. What are the main components of the Hive architecture?

4. What is the Hive Metastore?

5. What are the different modes of Hive Metastore?

6. What is HiveQL?

7. What are the different execution engines supported by Hive?

8. What is the difference between Hive and Pig?

9. What is the difference between Hive and Impala?

10. What are the limitations of Hive?

Data Model: Tables, Partitions & Buckets

11. What are the different types of tables in Hive?

12. When would you use an external table vs a managed table?

13. What is partitioning in Hive?

14. What are the benefits of partitioning?

15. What is bucketing in Hive?

16. What are the benefits of bucketing?

17. What is the difference between partitioning and bucketing?

18. What is dynamic partitioning in Hive?

19. What is static partitioning in Hive?

20. How do you enable dynamic partitioning in Hive?

21. What is a view in Hive?

22. What is an index in Hive?

23. What are the different types of indexes in Hive?

24. What is a transactional table in Hive?

25. How do you create a transactional table in Hive?

HiveQL: DDL, DML & Functions

26. What are the different data types supported by Hive?

27. How do you load data into a Hive table?

28. What is the difference between LOAD DATA and INSERT in Hive?

29. How do you insert data into a Hive table?

30. What is the difference between INSERT INTO and INSERT OVERWRITE?

31. How do you update data in a Hive table?

32. How do you delete data from a Hive table?

33. What are the different types of joins in Hive?

34. What is a map join in Hive?

35. How do you enable a map join in Hive?

36. What are user-defined functions (UDFs) in Hive?

37. What are the different types of built-in functions in Hive?

38. What is the ORDER BY clause in Hive?

39. What is the SORT BY clause in Hive?

40. What is the DISTRIBUTE BY clause in Hive?

41. What is the CLUSTER BY clause in Hive?

42. How do you handle NULL values in Hive?

43. What is the difference between LIKE and RLIKE in Hive?

44. What are lateral views in Hive?

45. How do you use EXPLAIN in Hive?

Data Storage & SerDe

46. What is SerDe in Hive?

47. What are the different file formats supported by Hive?

48. What is the difference between ORC and Parquet?

49. What are the advantages of using ORC format?

50. What are the advantages of using Parquet format?

51. What is compression in Hive and why is it important?

52. How do you enable compression in Hive?

53. What is the difference between Snappy and Gzip compression?

54. What is a custom SerDe in Hive?

55. How do you specify a file format when creating a table?

Performance Tuning & Optimization

56. What are some common techniques to optimize Hive queries?

57. What is vectorization in Hive?

58. What is predicate pushdown in Hive?

59. How do you control the number of mappers and reducers in Hive?

60. What is the significance of the hive.fetch.task.conversion property?

61. How do you handle data skew in Hive?

62. What is the purpose of the hive.optimize.skewjoin property?

63. How do you optimize joins in Hive?

64. What is the purpose of the hive.auto.convert.join property?

65. How do you optimize GROUP BY operations in Hive?

66. What is the purpose of the hive.groupby.skewindata property?

67. How do you optimize ORDER BY operations in Hive?

68. What is the purpose of the hive.exec.parallel property?

69. How do you handle small files in Hive?

70. What is the purpose of the hive.exec.dynamic.partition.mode property?

71. How do you monitor and analyze query performance in Hive?

72. What is the cost-based optimizer (CBO) in Hive?

73. How do you collect statistics in Hive?

74. What is the LLAP (Live Long and Process) feature in Hive?