Database / Snowflake Interview Questions

1. What is Snowflake and how does its multi-cluster, shared-data architecture differ from traditional data warehouses? 2. What are the three layers of Snowflake's architecture (Storage, Compute, Cloud Services) and what does each do? 3. What is a Virtual Warehouse in Snowflake and how does it scale independently of storage? 4. What is the difference between Snowflake's columnar storage and row-based storage in traditional RDBMS? 5. What is the Snowflake Cloud Services layer and what components does it manage? 6. What are micro-partitions in Snowflake and how do they enable automatic data clustering? 7. What is data clustering in Snowflake and when should you define a clustering key? 8. What is the difference between Snowflake Standard, Enterprise, Business Critical, and Virtual Private Snowflake edition? 9. How does Snowflake handle multi-cloud deployment and cross-cloud replication? 10. What is Snowflake's separation of compute from storage and what billing advantages does it provide? 11. What are the stages in Snowflake (internal vs external) and how do you use them for data loading? 12. How does the COPY INTO command work and what file formats does it support? 13. What is Snowpipe and how does it enable continuous / serverless data ingestion? 14. What is the difference between bulk loading with COPY INTO and micro-batch loading with Snowpipe? 15. How does Snowflake handle semi-structured data (JSON, Avro, Parquet, ORC) with the VARIANT type? 16. What are Snowflake Dynamic Tables and how do they differ from regular tables and views? 17. What is Time Travel in Snowflake and how does it work (retention period, UNDROP, AT/BEFORE)? 18. What is Fail-safe in Snowflake and how does it differ from Time Travel? 19. What is the Snowflake Query Profile and how do you use it to diagnose slow queries? 20. What is result caching in Snowflake and under what conditions does it apply? 21. What is the metadata cache (Cloud Services layer cache) and how does it speed up queries? 22. What is a clustering key and how does it reduce partition pruning cost for large tables? 23. What are Snowflake Materialized Views and when should you use them over regular views? 24. What is the difference between a Snowflake View, Materialized View, and Dynamic Table? 25. How do you optimize query performance in Snowflake (warehouse sizing, clustering, pruning, result cache)? 26. What are Snowflake Streams and how do they implement Change Data Capture (CDC)? 27. What are Snowflake Tasks and how do you schedule SQL transformations with them? 28. How does Snowflake implement Role-Based Access Control (RBAC) and what are the system-defined roles? 29. What is column-level security in Snowflake (Dynamic Data Masking and Column-level Security policies)? 30. What is Row Access Policy in Snowflake and how does it implement row-level security? 31. How does Snowflake encrypt data at rest and in transit? 32. What is Snowflake's Tri-Secret Secure model and when is it used? 33. What are Snowflake Object Tags and Data Classification and how do they support governance? 34. What is the Snowflake Access History feature and how does it support audit and compliance? 35. What is Snowflake Secure Data Sharing and how does it work without copying data? 36. What is the Snowflake Data Marketplace and what types of data products are available? 37. What are Snowflake Data Clean Rooms and what privacy problems do they solve? 38. What is Snowpark and how does it allow Python/Java/Scala code to run inside Snowflake? 39. What are Snowflake Native Apps and how does the Native App Framework work? 40. What are Snowflake External Tables and when would you use them over internal tables? 41. What is Snowflake's multi-cluster warehouse and how does it handle concurrency auto-scaling? 42. What is Resource Monitor in Snowflake and how do you use it to control credit consumption? 43. How does Snowflake support ELT patterns and how does it compare to ETL? 44. What are common Snowflake anti-patterns and performance pitfalls to avoid? 45. How does Snowflake compare to BigQuery and Redshift in architecture and pricing model?

Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is Snowflake and how does its multi-cluster, shared-data architecture differ from traditional data warehouses?

Snowflake is a cloud-native data platform delivered as a fully managed SaaS, built on cloud object storage (S3, Azure Blob, GCS). Its defining architectural trait is complete decoupling of storage from compute, setting it apart from every traditional warehouse model.

Traditional warehouses use either shared-nothing (Teradata, Redshift) where each node owns local disks — adding compute forces proportional storage growth — or shared-disk (Oracle RAC) where a central storage bus creates a bottleneck. Snowflake's multi-cluster shared-data model stores all data in cloud object storage and lets any number of independent Virtual Warehouses read from it simultaneously with no contention. An ETL warehouse and a BI warehouse can query the same petabyte at the same time, each billed only for their own compute seconds.

Architecture Comparison
Aspect	Shared-Nothing (Redshift)	Snowflake Multi-Cluster
Storage/compute coupling	Tightly coupled per node	Fully decoupled
Multiple concurrent workloads	One cluster, resource contention	Independent warehouses, no contention
Scale compute alone	No — nodes carry their own storage	Yes — resize warehouse without touching data
Idle cost	Pay for cluster 24/7	Suspend warehouse = zero compute billing

Take quiz

Which architectural model does Snowflake use where independent compute clusters read from a single shared cloud storage layer?Shared-disk

✗ Try again.

Multi-cluster shared-data

✓ Correct! Well done.

Shared-nothing

✗ Try again.

Peer-to-peer distributed

✗ Try again.

In a traditional shared-nothing data warehouse, adding more compute nodes results in what happening to storage?Storage scales proportionally because each node owns local disks

✓ Correct! Well done.

Only compute increases; storage is managed separately

✗ Try again.

Storage auto-scales in the cloud independently

✗ Try again.

2. What are the three layers of Snowflake's architecture (Storage, Compute, Cloud Services) and what does each do?

Snowflake is built on three independently operating layers, each with a clearly defined responsibility.

Storage Layer holds all data as columnar, compressed micro-partition files in cloud object storage (S3, Azure Blob, or GCS). Snowflake manages all storage operations automatically — no manual partitioning, indexing, or vacuuming required. The data is always available to any compute cluster that needs it.

Compute Layer consists of Virtual Warehouses — MPP clusters from X-Small to 6X-Large. Each warehouse is a temporary set of cloud compute instances that reads from storage, executes queries, and writes results. Warehouses start, resize, and suspend in seconds, completely independently of the data layer.

Cloud Services Layer is Snowflake's always-on brain. It handles query parsing and optimization, micro-partition metadata management (min/max values per column per partition), authentication and authorization, ACID transaction management, and infrastructure coordination. It runs continuously and is only billed when daily consumption exceeds 10% of compute credits.

-- Create a Virtual Warehouse in the Compute layer
CREATE WAREHOUSE reporting_wh
  WAREHOUSE_SIZE = 'LARGE'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE;

-- COUNT(*) is answered from Cloud Services metadata -- zero compute credits
SELECT COUNT(*) FROM orders;

Take quiz

Which Snowflake layer is responsible for query optimization, metadata management, and authentication?Storage layer

✗ Try again.

Compute (Virtual Warehouse) layer

✗ Try again.

Cloud Services layer

✓ Correct! Well done.

Replication layer

✗ Try again.

Where is Snowflake table data physically stored in the Storage layer?In cloud object storage such as S3, Azure Blob, or GCS

✓ Correct! Well done.

On Virtual Warehouse node SSDs

✗ Try again.

In the Cloud Services metadata store

✗ Try again.

3. What is a Virtual Warehouse in Snowflake and how does it scale independently of storage?

A Virtual Warehouse is a named MPP compute cluster in Snowflake that executes SQL queries, loads data, and performs DML operations. Unlike traditional databases where compute and storage share the same physical nodes, a Virtual Warehouse borrows cloud compute on demand and reads data directly from the shared cloud object storage layer — no data is moved when a warehouse starts, resizes, or suspends.

Warehouses scale in two dimensions. Scale up (resize to a larger T-shirt size) gives more memory and CPU per query — useful for complex aggregations, large sorts, or multi-table joins. Sizes double credits per hour: X-Small=1 credit, Small=2, Medium=4, Large=8, X-Large=16, 2X-Large=32, 3X-Large=64, 4X-Large=128. Scale out (multi-cluster warehouses, Enterprise+) adds parallel clusters to handle concurrent user workloads. Resizing is near-instant and affects only compute — the storage layer is completely untouched.

Auto-suspend stops billing when idle (configurable from 60s to 30m). Auto-resume brings the warehouse back on the next query in 1-2 seconds. Together they make pay-per-second compute practical for real workloads.

CREATE WAREHOUSE etl_wh
  WAREHOUSE_SIZE = 'X-LARGE'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

-- Resize without any data movement
ALTER WAREHOUSE etl_wh SET WAREHOUSE_SIZE = '2X-LARGE';

-- Suspend manually; billing stops immediately
ALTER WAREHOUSE etl_wh SUSPEND;

Take quiz

What happens to Snowflake data when a Virtual Warehouse is suspended?Data is compressed to cold storage

✗ Try again.

The warehouse retains a local warm cache

✗ Try again.

Data remains unchanged in cloud object storage

✓ Correct! Well done.

Data is flushed to a backup stage

✗ Try again.

How many Snowflake credits per hour does an X-Large warehouse consume?8

✗ Try again.

✓ Correct! Well done.

✗ Try again.

4. What is the difference between Snowflake's columnar storage and row-based storage in traditional RDBMS?

In row-based storage (MySQL, PostgreSQL, Oracle), a complete row is written as a contiguous block on disk. This is optimal for OLTP: a single-row read or update is one I/O. But analytical queries that select 3 columns out of 100 must scan the full row for every record — wasting I/O on irrelevant data.

In columnar storage (Snowflake, BigQuery, Redshift), all values from the same column are stored contiguously. An analytical SELECT of 3 columns skips the other 97 entirely — no wasted I/O. Values of the same data type also compress far better: run-length encoding, delta encoding, and dictionary encoding on homogeneous column data yield 3–10x compression ratios versus mixed-type row blocks. Snowflake additionally stores per-column min/max metadata per micro-partition, enabling partition pruning that is impossible with row-oriented storage.

Row-Based vs Columnar Storage
Aspect	Row-Based	Columnar (Snowflake)
Best for	OLTP (point reads/writes)	OLAP (scans, aggregations)
Compression	Moderate (mixed types per block)	High (same type per column)
Partial column query	Must read full row	Only reads selected columns
Single-row access	Fast — 1 I/O	Slower — reconstructs across column blocks

Take quiz

Why does columnar storage achieve higher compression ratios for analytical data?It uses a stronger cipher algorithm

✗ Try again.

Homogeneous column values compress far better with RLE and delta encoding

✓ Correct! Well done.

Column files are inherently smaller than row files

✗ Try again.

For a system with frequent single-row reads and updates (OLTP), which storage format is more efficient?Columnar storage

✗ Try again.

Row-based storage

✓ Correct! Well done.

Both perform equally

✗ Try again.

Columnar with row-level indexes

✗ Try again.

5. What is the Snowflake Cloud Services layer and what components does it manage?

The Cloud Services layer is Snowflake's always-on intelligence layer. It runs within Snowflake's managed infrastructure — not on customer-provisioned resources — and coordinates everything before and after a Virtual Warehouse executes a query. It is billed only when its daily consumption exceeds 10% of compute credits; in most workloads this threshold is never crossed.

The six primary components it manages:

Query Parser and Optimizer — parses SQL, resolves names, builds the execution plan, and selects which micro-partitions to scan via partition pruning.
Metadata Service — stores min/max column values, null counts, row counts, and distinct value estimates per micro-partition for every table. Enables partition pruning and zero-credit COUNT(*) answers.
Authentication and Authorization — handles logins (password, key-pair, OAuth, SAML/SSO) and all RBAC privilege checks before a query touches any data.
Transaction Manager — enforces ACID semantics, manages multi-statement transactions, and maintains snapshot isolation for concurrent readers and writers.
Infrastructure Manager — provisions and monitors Virtual Warehouse compute nodes; drives auto-suspend and auto-resume events.
Security Service — manages the hierarchical AES-256 key model, handles Tri-Secret Secure key requests, and enforces column and row security policies at query runtime.

Take quiz

Which Cloud Services component uses per-partition min/max metadata to determine which micro-partitions a query must scan?Transaction Manager

✗ Try again.

Query Parser and Optimizer

✓ Correct! Well done.

Infrastructure Manager

✗ Try again.

Security Service

✗ Try again.

Snowflake's ACID transaction management and snapshot isolation for concurrent queries are managed by which layer?Storage layer

✗ Try again.

Virtual Warehouse compute nodes

✗ Try again.

Client driver

✗ Try again.

Cloud Services layer

✓ Correct! Well done.

6. What are micro-partitions in Snowflake and how do they enable automatic data clustering?

Micro-partitions are the fundamental storage units in Snowflake. When data is loaded, Snowflake automatically divides it into contiguous blocks of 50–500 MB of uncompressed data (typically ~16 MB compressed on disk). Each micro-partition is stored as a columnar compressed file in cloud object storage. The key optimization property comes from the rich metadata Snowflake stores for each partition in the Cloud Services layer — not the data itself.

Per micro-partition, Snowflake records: minimum and maximum column values for every column, null count, distinct value count, and row count. When the query optimizer evaluates a predicate like WHERE order_date = '2024-03-15', it reads this metadata and skips any micro-partition whose max(order_date) is earlier or whose min(order_date) is later. This is partition pruning — no data bytes are read from skipped partitions.

Automatic clustering refers to the fact that data loaded chronologically (e.g., daily appended events) naturally groups similar values into the same micro-partitions. For tables where load order does not match query patterns, explicit clustering keys can trigger reorganization. The better the clustering, the fewer micro-partitions must be scanned per predicate, directly reducing query cost.

-- Check clustering quality of the sales table
SELECT SYSTEM$CLUSTERING_INFORMATION('sales', '(order_date)');

-- Micro-partition statistics via Information Schema
SELECT table_name, clustering_key, row_count, bytes
FROM information_schema.tables
WHERE table_schema = 'PUBLIC';

Take quiz

What is the typical uncompressed size range of a single Snowflake micro-partition?50–500 MB

✓ Correct! Well done.

1–10 MB

✗ Try again.

1–5 GB

✗ Try again.

512 KB–5 MB

✗ Try again.

Which metadata stored in the Cloud Services layer allows Snowflake to skip entire micro-partitions without reading data?Full column histograms

✗ Try again.

Bloom filter indexes

✗ Try again.

Per-column min/max values per partition

✓ Correct! Well done.

B-tree indexes per column

✗ Try again.

7. What is data clustering in Snowflake and when should you define a clustering key?

Data clustering describes how closely the physical order of rows across micro-partitions aligns with the columns used in query predicates. When data is well-clustered on a column, rows with similar values land in the same micro-partitions. A range predicate on that column then prunes away most partitions because their min/max ranges do not overlap the filter value.

A clustering key is an explicit declaration telling Snowflake to sort and co-locate data based on one or more columns or expressions. The Automatic Clustering background service (Enterprise+) periodically re-clusters micro-partitions to maintain this sort order as new data arrives. Snowflake charges for the background compute used.

The key metric is average_overlap_depth from SYSTEM$CLUSTERING_INFORMATION(): a value of 1.0 means perfect clustering (no partition overlap); higher values mean more overlap and worse pruning efficiency.

Define a clustering key when: the table is larger than ~1 TB; queries frequently filter on a predictable column set (date, region, category); and data does not arrive pre-sorted. Skip it when: the table is small, accessed randomly, or frequently truncated and reloaded.

-- Define clustering key on sales table
ALTER TABLE sales CLUSTER BY (order_date, region);

-- Check current clustering quality
SELECT PARSE_JSON(
  SYSTEM$CLUSTERING_INFORMATION('sales', '(order_date, region)')
):average_overlap_depth::FLOAT AS overlap_depth;

-- Query that benefits from clustering-based pruning
SELECT product_id, SUM(amount)
FROM sales
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
  AND region = 'EMEA'
GROUP BY product_id;

Take quiz

Which Snowflake function returns the clustering depth metric to assess partitioning quality on a given column set?GET_CLUSTERING_DEPTH()

✗ Try again.

SHOW CLUSTERING INFO

✗ Try again.

TABLE_STATISTICS()

✗ Try again.

SYSTEM$CLUSTERING_INFORMATION()

✓ Correct! Well done.

The Automatic Clustering background service that maintains sort order on clustered tables is available starting from which edition?Standard

✗ Try again.

Enterprise

✓ Correct! Well done.

Business Critical only

✗ Try again.

It is available in all editions

✗ Try again.

8. What is the difference between Snowflake Standard, Enterprise, Business Critical, and Virtual Private Snowflake edition?

Snowflake offers four service editions, each a strict superset of the previous. Edition choice is driven primarily by compliance requirements, data sensitivity, and advanced concurrency needs.

Snowflake Edition Feature Matrix
Feature	Standard	Enterprise	Business Critical	VPS
Multi-cluster warehouses	No	Yes	Yes	Yes
Time Travel retention	Up to 1 day	Up to 90 days	Up to 90 days	Up to 90 days
Materialized Views	No	Yes	Yes	Yes
Column/row security policies	No	Yes	Yes	Yes
HIPAA / PCI DSS compliance	No	No	Yes	Yes
Tri-Secret Secure (CMK)	No	No	Yes (add-on)	Yes
AWS PrivateLink / Azure PL	No	No	Yes	Yes
Dedicated hardware isolation	No	No	No	Yes
Account failover/failback	No	No	Yes	Yes

Virtual Private Snowflake (VPS) runs on dedicated hardware fully isolated from all other Snowflake customers. It targets financial institutions and government agencies requiring hardware-level isolation and customer-managed encryption throughout the entire environment.

Take quiz

Which is the minimum Snowflake edition that enables multi-cluster warehouses?Enterprise

✓ Correct! Well done.

Standard

✗ Try again.

Business Critical

✗ Try again.

Virtual Private Snowflake only

✗ Try again.

A healthcare organization needing HIPAA compliance and customer-managed encryption keys requires at minimum which edition?Standard

✗ Try again.

Enterprise

✗ Try again.

Business Critical

✓ Correct! Well done.

No edition supports both requirements

✗ Try again.

9. How does Snowflake handle multi-cloud deployment and cross-cloud replication?

Snowflake accounts are created on a specific cloud platform (AWS, Azure, or GCP) and in a specific region. While the primary account is tied to that choice, Snowflake supports replicating data and account objects across regions and even across cloud providers — enabling disaster recovery, regional data residency compliance, and low-latency access for global users.

Database Replication copies individual databases from a primary account to one or more secondary (read-only replica) accounts on any supported cloud/region combination. Replicas are kept in sync on a configurable refresh schedule. Replication Groups extend this to bundle multiple databases and other objects into a single consistent snapshot unit.

Failover Groups add the ability to promote a secondary account to primary (failover) and demote the original (failback), delivering true disaster recovery with a configurable RPO. Account-level replication (Business Critical+) also replicates users, roles, resource monitors, and warehouses.

Cross-cloud replication (e.g., AWS → Azure) works the same as cross-region but incurs cloud egress charges.
Within the same cloud provider, cross-region replication is cheaper.
Secondary accounts serve read-only queries and regional data sharing to local consumers.
Replication is billed: compute credits for the refresh run plus data transfer fees.

Take quiz

Which Snowflake object type, when promoted, allows a secondary account to become the new primary with read-write capability?Replication Group

✗ Try again.

Database Replication object

✗ Try again.

Failover Group

✓ Correct! Well done.

An account clone

✗ Try again.

Which cloud platforms does Snowflake currently support for native account deployment?AWS and Azure only

✗ Try again.

AWS and GCP only

✗ Try again.

Azure and GCP only

✗ Try again.

AWS, Azure, and GCP

✓ Correct! Well done.

10. What is Snowflake's separation of compute from storage and what billing advantages does it provide?

In Snowflake, storage and compute are billed as completely independent resources. Storage is charged at a flat rate per terabyte per month of compressed data in cloud object storage, regardless of query volume. Compute is charged per credit-second, and only when a Virtual Warehouse is actively in the STARTED state.

This separation creates three concrete billing advantages. First, an idle warehouse can be suspended in seconds, immediately stopping all compute billing while data remains intact and instantly accessible. Second, multiple warehouses reading the same data cost no extra storage — a marketing analytics warehouse and a data science warehouse can both query the same 50 TB of data, each paying only for their own compute time. Third, you can spin up a powerful warehouse for a one-hour intensive job and pay for only those 3,600 seconds of compute, rather than running that power 24/7.

Compare this to Amazon Redshift provisioned clusters: the cluster runs continuously whether queries execute or not — you pay for every hour it exists. Redshift Serverless partially addresses this, but Snowflake's model is fundamentally more granular and flexible.

Storage: billed per TB/month compressed (typically 40–60% of uncompressed size).
Compute: billed per credit-second; 1 credit costs roughly $2–4 depending on edition and cloud region.
Cloud Services: free up to 10% of daily compute consumption; excess billed separately.
Data transfer: egress charges apply for cross-region or cross-cloud replication.

Take quiz

When a Snowflake Virtual Warehouse is suspended, what immediately changes about billing?Compute billing stops; only storage continues to accrue

✓ Correct! Well done.

Billing pauses and resumes from where it left off

✗ Try again.

Billing drops to 10% of normal rate

✗ Try again.

Billing continues at half rate for 30 minutes then stops

✗ Try again.

Two Virtual Warehouses querying the same 100 TB Snowflake table results in how much additional storage cost?Double the storage

✗ Try again.

50% more storage

✗ Try again.

No additional storage — both share the same storage layer

✓ Correct! Well done.

Depends on query frequency

✗ Try again.

11. What are the stages in Snowflake (internal vs external) and how do you use them for data loading?

A stage in Snowflake is a named location holding data files before they are loaded into a table (or after being unloaded from one). Stages abstract the physical storage location so the COPY INTO command works identically whether files sit in Snowflake-managed storage or in your own cloud bucket.

Internal stages are managed by Snowflake within its own cloud storage. Three kinds exist: the user stage (@~, one per user, auto-created), the table stage (@%table_name, one per table, auto-created), and named internal stages (@my_stage, created explicitly, shareable across roles and sessions).

External stages point to customer-owned cloud storage (S3 bucket, Azure Blob container, GCS bucket). Defined with a URL and either inline credentials or a Storage Integration object (recommended — avoids embedding credentials in DDL). Supports all the same file formats as COPY INTO.

-- Create a named internal stage with a named file format
CREATE STAGE my_internal_stage
  FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1);

-- Upload a local file to the internal stage (run in SnowSQL)
PUT file:///local/path/orders.csv @my_internal_stage;

-- List files in a stage
LIST @my_internal_stage;

-- Create an external stage pointing to S3 using a Storage Integration
CREATE STAGE my_s3_stage
  URL = 's3://my-bucket/data/'
  STORAGE_INTEGRATION = my_s3_integration
  FILE_FORMAT = (TYPE = 'PARQUET');

Take quiz

Which type of Snowflake internal stage is automatically available for every user without needing to create it?Named internal stage

✗ Try again.

Table stage (@%table_name)

✗ Try again.

User stage (@~)

✓ Correct! Well done.

Shared internal stage

✗ Try again.

Which command uploads a local file from a client machine to a Snowflake internal stage?COPY INTO

✗ Try again.

PUT

✓ Correct! Well done.

LOAD DATA

✗ Try again.

PUSH FILE

✗ Try again.

12. How does the COPY INTO command work and what file formats does it support?

COPY INTO is Snowflake's bulk data loading command. It reads files from a stage (internal or external) and loads them into a target table in a single transactional operation. The command tracks which files have already been loaded using a 64-day load metadata store — by default, re-running COPY INTO on the same files is a no-op, preventing accidental duplicates.

Supported file formats: CSV, JSON, Avro, ORC, Parquet, and XML. Format options are specified inline or via a named FILE FORMAT object. Key ON_ERROR options control how errors are handled:

ON_ERROR = ABORT_STATEMENT (default) — rolls back the entire load on any error.
ON_ERROR = CONTINUE — skips bad rows and loads the rest.
ON_ERROR = SKIP_FILE — skips an entire file that contains any error.
PURGE = TRUE — deletes source files from the stage after a successful load.
FORCE = TRUE — reloads files even if already loaded (ignores load metadata).
VALIDATION_MODE = RETURN_ERRORS — dry-run; validates files without loading.

COPY INTO orders
FROM @my_s3_stage/orders/
FILE_FORMAT = (
  TYPE = 'CSV'
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  SKIP_HEADER = 1
  NULL_IF = ('NULL', 'null', ''))
ON_ERROR = 'CONTINUE'
PURGE = TRUE;

-- Validate without loading
COPY INTO orders FROM @my_s3_stage
VALIDATION_MODE = 'RETURN_ERRORS';

Take quiz

What does ON_ERROR = 'CONTINUE' do during a COPY INTO operation?Retries the failed file three times before skipping

✗ Try again.

Aborts the load but logs all errors for review

✗ Try again.

Skips individual bad rows and continues loading the rest of the file

✓ Correct! Well done.

Skips entire files that contain any error

✗ Try again.

Which COPY INTO option forces Snowflake to reload files already successfully loaded in a previous run?FORCE = TRUE

✓ Correct! Well done.

RELOAD = TRUE

✗ Try again.

OVERRIDE_METADATA = TRUE

✗ Try again.

RESET_LOAD_HISTORY = TRUE

✗ Try again.

13. What is Snowpipe and how does it enable continuous / serverless data ingestion?

Snowpipe is Snowflake's serverless, event-driven data ingestion service. Unlike COPY INTO which requires a running Virtual Warehouse and an explicit trigger, Snowpipe uses Snowflake-managed serverless compute and fires automatically when new files arrive in a stage. Typical latency is under one minute from file arrival to data being queryable.

Snowpipe supports two trigger modes. Auto-ingest (preferred): you configure the cloud storage bucket to emit event notifications — Amazon SQS for S3, Azure Event Grid for Blob Storage, Google Pub/Sub for GCS — to Snowflake. When files land, the notification triggers the pipe automatically with no polling needed. REST API mode: call Snowpipe's REST endpoint programmatically with a list of file paths; useful when you control the producer and need deterministic triggering.

Billing is based on compute seconds consumed for loading, not warehouse credits. This makes Snowpipe cost-effective for small, frequent file arrivals that would otherwise require a warehouse to run continuously.

-- Create a pipe that auto-triggers on new S3 files
CREATE PIPE orders_pipe
  AUTO_INGEST = TRUE
AS
  COPY INTO orders
  FROM @my_s3_stage/orders/
  FILE_FORMAT = (TYPE = 'JSON');

-- After creation, retrieve the SQS ARN and configure S3 event notification
SHOW PIPES LIKE 'orders_pipe';

Take quiz

For AWS S3 auto-ingest, which AWS service delivers file-arrival event notifications to Snowpipe?Amazon SQS

✓ Correct! Well done.

Amazon SNS directly

✗ Try again.

AWS Lambda polling Snowflake

✗ Try again.

Amazon EventBridge to REST

✗ Try again.

What compute resource executes Snowpipe data loads?The user's running virtual warehouse

✗ Try again.

A shared Snowflake BI warehouse

✗ Try again.

Snowflake-managed serverless compute, separate from user warehouses

✓ Correct! Well done.

EC2 instances provisioned by the customer

✗ Try again.

14. What is the difference between bulk loading with COPY INTO and micro-batch loading with Snowpipe?

COPY INTO and Snowpipe both load files from stages into Snowflake tables, but they differ fundamentally in trigger mechanism, compute model, transaction semantics, and cost structure.

COPY INTO vs Snowpipe
Dimension	COPY INTO (Bulk)	Snowpipe (Micro-batch)
Trigger	Explicit SQL command (manual, Task, or orchestrator)	Auto-ingest event notification or REST API
Compute	Runs on user Virtual Warehouse	Snowflake serverless compute
Billing	Credits per second of warehouse time	Compute-seconds for loading
Latency	Minutes to hours (batch cadence)	~1 minute after file arrives
Transactions	Full ACID; can be rolled back	At-least-once; no explicit rollback
Error handling	ON_ERROR options; full rollback available	Failed files retried automatically
Optimal file size	100–250 MB compressed	Works with smaller files; batches them
Best for	Nightly ETL, large batch loads	Near-real-time streaming ingestion

Take quiz

Which loading method is best for near-real-time ingestion of small, frequent files arriving continuously?Bulk COPY INTO on a scheduled Task

✗ Try again.

Snowpipe with auto-ingest

✓ Correct! Well done.

MERGE statement on a virtual warehouse

✗ Try again.

External Table refresh

✗ Try again.

Snowpipe billing is based on which unit, distinct from regular warehouse credit billing?Credits per minute the pipe is active

✗ Try again.

Per-row insertion fees

✗ Try again.

Node-hours of cluster time

✗ Try again.

Compute seconds consumed for file loading

✓ Correct! Well done.

15. How does Snowflake handle semi-structured data (JSON, Avro, Parquet, ORC) with the VARIANT type?

Snowflake stores semi-structured data in a VARIANT column — a flexible container that holds any valid JSON, Avro, Parquet, ORC, or XML value up to 16 MB per cell. Unlike traditional columns, VARIANT does not require a predefined schema. You can load raw JSON with an evolving structure and query the fields you need at read time via dot and bracket notation.

Internally, Snowflake parses and columnarizes VARIANT data at load time, extracting paths and types into an efficient encoding. This allows the optimizer to apply column-level statistics on frequently accessed paths, enabling partial partition pruning even for semi-structured data.

Dot notation: src:customer.address.city::STRING — navigates nested objects.
Bracket notation: src['customer']['address'] — equivalent to dot notation.
Array indexing: src:items[0]:product_id::NUMBER.
FLATTEN: lateral table function that explodes arrays or objects into rows for relational processing.
Type casting: ::STRING, ::NUMBER, ::TIMESTAMP_NTZ cast extracted values to typed columns for better statistics and comparison performance.

-- Table with a VARIANT column for raw JSON events
CREATE TABLE events (
  event_id  NUMBER AUTOINCREMENT,
  raw       VARIANT,
  loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Query nested JSON using dot notation with type cast
SELECT
  raw:user_id::STRING     AS user_id,
  raw:event_type::STRING  AS event_type,
  raw:properties:page::STRING AS page
FROM events
WHERE raw:event_type::STRING = 'pageview';

-- FLATTEN an array of order items into rows
SELECT
  o.raw:order_id::STRING AS order_id,
  item.value:sku::STRING AS sku,
  item.value:qty::NUMBER AS qty
FROM orders o,
LATERAL FLATTEN(INPUT => o.raw:items) item;

Take quiz

What is the maximum size of a single VARIANT column value in Snowflake?1 MB

✗ Try again.

4 MB

✗ Try again.

16 MB

✓ Correct! Well done.

Unlimited

✗ Try again.

Which Snowflake table function expands a VARIANT array or object into multiple relational rows?EXPLODE()

✗ Try again.

UNNEST()

✗ Try again.

ARRAY_EXPAND()

✗ Try again.

FLATTEN()

✓ Correct! Well done.

16. What are Snowflake Dynamic Tables and how do they differ from regular tables and views?

A Dynamic Table is a Snowflake object that automatically maintains the result of a defining SQL query and refreshes it to stay within a user-specified staleness limit. You declare what you want the table to contain and how fresh it must be — Snowflake handles all the scheduling, incremental or full refreshes, and dependency tracking.

The TARGET_LAG parameter is the core control: it sets the maximum acceptable staleness (e.g., '5 minutes', '1 hour'). Snowflake refreshes the Dynamic Table on a schedule that keeps actual lag within that target. Refreshes run on a user-specified warehouse or on serverless compute.

Dynamic Tables can be chained: if DT_B depends on DT_A, Snowflake refreshes them in the correct order automatically — similar to a DAG without explicit orchestration. This makes them ideal for multi-step ELT pipelines.

Regular Table vs View vs Dynamic Table
Aspect	Regular Table	View	Dynamic Table
Data stored	Yes	No	Yes
Auto-refreshed	No	N/A	Yes (TARGET_LAG)
Supports multi-table joins	Yes	Yes	Yes
Best for	Manual ETL loads	Query abstraction	Declarative pipeline results

CREATE DYNAMIC TABLE customer_summary
  TARGET_LAG = '10 minutes'
  WAREHOUSE = transform_wh
AS
  SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id)  AS total_orders,
    SUM(o.amount)      AS lifetime_value
  FROM customers c
  LEFT JOIN orders o ON c.customer_id = o.customer_id
  GROUP BY c.customer_id, c.name;

Take quiz

Which Dynamic Table parameter specifies the maximum allowable staleness of the table's data?REFRESH_INTERVAL

✗ Try again.

TARGET_LAG

✓ Correct! Well done.

MAX_STALE_SECONDS

✗ Try again.

SCHEDULE

✗ Try again.

What advantage does a Snowflake Dynamic Table pipeline have over a regular Materialized View for multi-step transformations?Dynamic Tables use less storage

✗ Try again.

Dynamic Tables bypass the compute layer

✗ Try again.

Dynamic Tables support multi-table joins and can be chained so downstream tables refresh automatically

✓ Correct! Well done.

Dynamic Tables are faster to query than Materialized Views

✗ Try again.

17. What is Time Travel in Snowflake and how does it work (retention period, UNDROP, AT/BEFORE)?

Time Travel is Snowflake's ability to query, clone, or restore data as it existed at any point within a configurable retention window. When a row is deleted or updated, Snowflake does not immediately remove the old micro-partitions — it retains them for the duration of the Time Travel period. The original data is accessible via special SQL syntax.

The AT clause queries data as it existed at a specific moment using a timestamp, a negative offset in seconds from now, or a statement ID. The BEFORE clause returns the state just before a named statement executed — useful for recovering from a mistaken DML. After the retention window expires, data moves to the Fail-safe tier.

Retention period: Standard edition = 1 day. Enterprise+ = configurable per object up to 90 days (set via DATA_RETENTION_TIME_IN_DAYS). Longer retention increases storage costs since old micro-partitions are retained. Transient and temporary tables have 0-day Fail-safe and max 1-day Time Travel.

-- Query orders as they existed one hour ago
SELECT * FROM orders AT (OFFSET => -3600);

-- Restore data before an accidental DELETE (use the query_id of the DELETE statement)
SELECT * FROM orders BEFORE (STATEMENT => '01ab23cd-...');

-- Clone a table to its state from yesterday as a point-in-time backup
CREATE TABLE orders_backup
  CLONE orders AT (TIMESTAMP => '2024-06-01 09:00:00'::TIMESTAMP_NTZ);

-- Recover a dropped table (must be within retention window)
UNDROP TABLE orders;

Take quiz

Which Time Travel SQL clause queries a table exactly 3 hours before the current time?AT (TIMESTAMP => CURRENT_TIMESTAMP() - INTERVAL '3 hours')

✗ Try again.

AT (OFFSET => -10800)

✓ Correct! Well done.

BEFORE (HOURS => 3)

✗ Try again.

TRAVEL_BACK BY 10800 SECONDS

✗ Try again.

What SQL command recovers a Snowflake table that was accidentally dropped within the retention window?RESTORE TABLE orders

✗ Try again.

UNDROP TABLE orders

✓ Correct! Well done.

ALTER TABLE orders UNDELETE

✗ Try again.

ROLLBACK DROP TABLE orders

✗ Try again.

18. What is Fail-safe in Snowflake and how does it differ from Time Travel?

Fail-safe is an additional 7-day data protection period that begins after a table's Time Travel retention window expires. It is Snowflake's last-resort disaster recovery safety net — but unlike Time Travel, it is not accessible to customers directly. Only Snowflake Support personnel can initiate a Fail-safe recovery, and only for genuine disaster scenarios.

Time Travel vs Fail-safe
Dimension	Time Travel	Fail-safe
Who can access	The account owner / any authorized user	Snowflake Support only
Retention duration	0–90 days (configurable per object)	Fixed 7 days (not configurable)
Trigger	AT/BEFORE syntax; UNDROP command	Support ticket to Snowflake
Use case	Accidental DML recovery, cloning, historical analysis	Data centre-level disaster recovery
Applies to	Permanent tables, schemas, databases	Permanent tables only
Transient/temp tables	0 or 1 day	0 days (no Fail-safe)
Storage cost	Yes — old micro-partitions billed	Yes — same rate as active storage

A practical implication: after you drop a table and the Time Travel window closes, you cannot recover it yourself. If the data is critical, open a Snowflake Support case within the 7-day Fail-safe window before it expires permanently.

Take quiz

Which statement correctly describes who can access Snowflake Fail-safe data?Any ACCOUNTADMIN-role user via AT/BEFORE syntax

✗ Try again.

Only Snowflake Support — end users cannot access it directly

✓ Correct! Well done.

SECURITYADMIN can initiate a restore from SNOWFLAKE.ACCOUNT_USAGE

✗ Try again.

Only the original owner of the dropped table

✗ Try again.

What is the fixed duration of Snowflake's Fail-safe period for permanent tables?1 day

✗ Try again.

30 days

✗ Try again.

7 days — not configurable by the customer

✓ Correct! Well done.

Equal to the Time Travel retention period

✗ Try again.

19. What is the Snowflake Query Profile and how do you use it to diagnose slow queries?

Query Profile is Snowflake's built-in visual execution plan viewer, available in Snowsight under Query History for any completed or actively running query. It shows the full operator tree for a query execution — every node representing a distinct processing step — along with timing, row counts, and resource metrics at each step. It is your first stop when a query is slower than expected.

Key operator node types: TableScan (reads micro-partitions), Filter, Aggregate, Join, Sort, Exchange (shuffle data between nodes), Projection.

The most important diagnostic signals:

Partitions Scanned vs Partitions Total — if scanned is close to total, the query is not pruning well. Consider adding a clustering key on the filter columns.
Bytes Spilled to Local Storage — the warehouse ran out of in-memory space for intermediate results; consider scaling up the warehouse size.
Bytes Spilled to Remote Storage — even worse spilling to cloud storage; a significantly larger warehouse or a query rewrite is needed.
Bytes Sent over Network in Join nodes — large data shuffle between nodes; pushing down more filters earlier can reduce this.
Compilation Time vs Execution Time — very high compilation time relative to execution indicates a complex query plan; simplify CTEs or break into steps.
% of execution per node — a single node consuming 90%+ of time is the bottleneck; focus optimization there.

Take quiz

In Snowflake Query Profile, what does a large value for Bytes Spilled to Remote Storage indicate?The query returned more rows than the client can display

✗ Try again.

The warehouse is idle and should be auto-resumed

✗ Try again.

The stage files are corrupt

✗ Try again.

The warehouse exhausted local memory and spilled intermediate data to cloud storage — warehouse should be scaled up

✓ Correct! Well done.

A Query Profile shows that 980 out of 1000 total partitions were scanned. What optimization would most likely reduce this?Increasing the warehouse size

✗ Try again.

Adding a clustering key on the columns used in the WHERE clause

✓ Correct! Well done.

Enabling result caching

✗ Try again.

Switching to Snowpipe for ingestion

✗ Try again.

20. What is result caching in Snowflake and under what conditions does it apply?

Result caching stores the complete output of a query in the Cloud Services layer for 24 hours. When an identical query runs again within that window, Snowflake returns the cached result instantly — no warehouse compute is consumed, no storage is read, and no credits are charged. This is Snowflake's most impactful zero-cost optimization for repeated analytical queries and dashboards.

For the result cache to apply, all five conditions must hold:

The SQL text is character-for-character identical (whitespace counts).
The same virtual warehouse is used (or the session's warehouse context matches).
The same role context is active (ensures the user still has access to the underlying data).
No DML (INSERT, UPDATE, DELETE, MERGE, COPY INTO) has touched any underlying table since the result was cached.
The query contains no non-deterministic functions: CURRENT_TIMESTAMP(), CURRENT_DATE(), RANDOM(), UUID_STRING(), SEQ4(), etc.

Common cache-busting mistakes: adding LIMIT 10, appending a comment, changing column aliases, or calling GETDATE() in a WHERE clause. Verify cache usage by checking QUERY_TYPE = 'RESULT_REUSE' in SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.

-- First execution: runs on warehouse, result cached
SELECT region, SUM(revenue) AS total
FROM sales
GROUP BY region;

-- Second execution (identical SQL): served from cache, 0 credits
SELECT region, SUM(revenue) AS total
FROM sales
GROUP BY region;

-- Disabling result cache for benchmarking
ALTER SESSION SET USE_CACHED_RESULT = FALSE;

Take quiz

How long does Snowflake's result cache retain a query result?1 hour

✗ Try again.

12 hours

✗ Try again.

24 hours

✓ Correct! Well done.

72 hours

✗ Try again.

Which function call in a query always prevents the Snowflake result cache from being used?CURRENT_TIMESTAMP() — it makes the query non-deterministic so no result can be safely reused

✓ Correct! Well done.

TO_DATE() on a literal string

✗ Try again.

COUNT(*)

✗ Try again.

NVL(col, 0)

✗ Try again.

21. What is the metadata cache (Cloud Services layer cache) and how does it speed up queries?

The metadata cache lives entirely within the Cloud Services layer and is separate from both the result cache and the local SSD cache on Virtual Warehouse nodes. It stores structural and statistical information about every micro-partition for every table: per-column minimum and maximum values, null counts, row counts, and approximate distinct value counts. This metadata is updated in real time as data is loaded, inserted, or deleted.

The metadata cache serves two critical performance functions. First, the query optimizer reads it before any data scan begins, using column min/max ranges to decide which micro-partitions are irrelevant to the predicate (partition pruning) — this happens entirely within Cloud Services and costs no warehouse credits. Second, certain queries can be answered completely from metadata without reading a single data file, consuming zero compute credits:

SELECT COUNT(*) FROM orders — answered from stored row count.
SELECT MIN(order_date), MAX(order_date) FROM orders — answered from stored min/max.
SELECT COUNT(DISTINCT region) FROM orders (approximate) — can use HLL sketches.
Schema queries: DESCRIBE TABLE, SHOW TABLES.

Take quiz

Which query can Snowflake answer entirely from the metadata cache without consuming any warehouse compute credits?SELECT COUNT(*) FROM large_table

✓ Correct! Well done.

SELECT * FROM large_table WHERE id = 42

✗ Try again.

SELECT SUM(price) FROM orders

✗ Try again.

SELECT DISTINCT customer_id FROM events

✗ Try again.

What metadata information does the Cloud Services layer store per micro-partition to enable partition pruning?Full histograms of every column

✗ Try again.

Bloom filter indexes per column

✗ Try again.

Per-column min/max values, null counts, row counts, and distinct value estimates

✓ Correct! Well done.

MD5 hash of each row

✗ Try again.

22. What is a clustering key and how does it reduce partition pruning cost for large tables?

A clustering key is a column (or expression) declared on a Snowflake table that tells Snowflake to physically sort and co-locate rows with similar values into the same micro-partitions. When the key is a date column, all rows for a given month land in the same small set of partitions. A query filtering on that month then only reads those partitions — the rest are skipped by partition pruning.

Without a clustering key, data arrives in load order. On a table loaded from hundreds of daily files over several years, any given month's rows may be scattered across thousands of micro-partitions. A query for one month must scan all of them. With an effective clustering key, average_overlap_depth drops toward 1.0 — meaning each predicate matches only ~1 partition-layer deep.

The key metric is from SYSTEM$CLUSTERING_INFORMATION(): average_overlap_depth. A value near 1.0 = excellent clustering; a value near the total partition count = no clustering benefit. Automatic Clustering (Enterprise+) runs a background service that continuously re-clusters data as new rows arrive.

-- Define clustering key on a multi-billion-row fact table
ALTER TABLE sales CLUSTER BY (order_date, region);

-- Inspect clustering quality
SELECT
  PARSE_JSON(SYSTEM$CLUSTERING_INFORMATION('sales','(order_date, region)'))
    :average_overlap_depth::FLOAT AS overlap_depth;

-- This query now prunes away non-January partitions
SELECT product_id, SUM(revenue)
FROM sales
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'
  AND region = 'APAC'
GROUP BY product_id;

Take quiz

What metadata does the query optimizer use to skip micro-partitions not relevant to a WHERE clause predicate?Per-column min/max values stored in the Cloud Services metadata layer

✓ Correct! Well done.

B-tree indexes maintained per column

✗ Try again.

Hash partitioning keys on the storage nodes

✗ Try again.

Full column histograms built at query time

✗ Try again.

An average_overlap_depth of 1.0 from SYSTEM$CLUSTERING_INFORMATION indicates what about the table?The clustering key is not defined

✗ Try again.

Partitions are perfectly clustered — queries on that key scan only one depth layer of partitions

✓ Correct! Well done.

The table is too small to benefit from clustering

✗ Try again.

Automatic Clustering is disabled

✗ Try again.

23. What are Snowflake Materialized Views and when should you use them over regular views?

A Materialized View (MV) in Snowflake is a pre-computed, physically stored snapshot of a SELECT query's result. Unlike a regular view that re-executes its SQL every time it is queried, an MV stores the result as a table that Snowflake automatically keeps fresh in the background as the base table changes. Reads from an MV are as fast as reads from a regular table.

Snowflake refreshes MVs incrementally — only the rows affected by changes to the base table are reprocessed, not the entire result set. This background refresh uses Snowflake-managed serverless compute and is billed separately from warehouse usage. The cost model is: storage for the materialized result + serverless compute credits for incremental maintenance.

Use an MV when: a complex aggregation is queried frequently, the base table changes predictably (not constantly), and query latency matters more than the added storage and maintenance cost. Skip MVs when: the base table is updated at very high frequency (refresh cost exceeds the query savings), the query joins multiple tables (MVs only support a single base table in Snowflake), or the query uses non-deterministic functions. Requires Enterprise edition or higher.

-- Create a Materialized View of daily sales aggregates
CREATE MATERIALIZED VIEW daily_sales_mv AS
  SELECT
    DATE_TRUNC('day', order_ts) AS sale_date,
    region,
    SUM(amount)                AS daily_revenue,
    COUNT(*)                   AS order_count
  FROM orders
  GROUP BY 1, 2;

-- Query the MV — reads pre-computed data, not the full orders table
SELECT * FROM daily_sales_mv
WHERE sale_date >= '2024-01-01';

Take quiz

Snowflake Materialized Views are available starting from which edition?Standard

✗ Try again.

Enterprise edition or higher

✓ Correct! Well done.

Business Critical only

✗ Try again.

All editions including Standard

✗ Try again.

What is the primary limitation of Snowflake Materialized Views that Dynamic Tables do not share?MVs cannot be queried with a regular SELECT

✗ Try again.

MVs support only a single base table — they cannot join multiple tables

✓ Correct! Well done.

MVs are not automatically refreshed

✗ Try again.

MVs require a running warehouse to stay current

✗ Try again.

24. What is the difference between a Snowflake View, Materialized View, and Dynamic Table?

These three object types all provide an abstraction layer over raw tables, but they differ fundamentally in whether they store data physically, how they refresh, and what query patterns they support.

View vs Materialized View vs Dynamic Table
Dimension	Regular View	Materialized View	Dynamic Table
Physical data stored	No — virtual	Yes	Yes
Auto-refreshed	N/A — always current	Yes, incrementally by Snowflake	Yes, based on TARGET_LAG
Refresh trigger	On query (inline rewrite)	Base table DML (incremental)	Snowflake scheduler (lag-based)
Multi-table joins	Yes	No — single base table only	Yes
Pipeline chaining	No	No	Yes — chain DTs together
Extra storage cost	None	Yes	Yes
Background compute cost	None	Serverless (incremental)	Warehouse or serverless
Best for	Simple aliases, access control	Repeated aggregate on one table	Multi-step ELT pipelines

Choosing between them: use a View when you only need a query abstraction or access restriction with no performance concern. Use a Materialized View when you need fast repeated reads of a single-table aggregate with predictable change frequency. Use a Dynamic Table when you need declarative, refreshed pipeline results across multiple joined tables or multi-step transformation chains.

Take quiz

Which of the three object types — View, Materialized View, or Dynamic Table — stores no physical data and executes SQL inline at query time?Dynamic Table

✗ Try again.

Regular View

✓ Correct! Well done.

Materialized View

✗ Try again.

All three store data physically

✗ Try again.

You need a transformation that joins five tables and must stay fresh within 5 minutes. Which object type is best suited?Regular View

✗ Try again.

Materialized View

✗ Try again.

Dynamic Table with TARGET_LAG = '5 minutes'

✓ Correct! Well done.

External Table with refresh schedule

✗ Try again.

25. How do you optimize query performance in Snowflake (warehouse sizing, clustering, pruning, result cache)?

Snowflake query optimization covers four levers, each addressing a different root cause. Using the wrong lever wastes money without helping performance.

Warehouse sizing (scale up) helps when a single query is slow due to complexity — large sorts, multi-way joins, or spilling to disk. Check Query Profile for Bytes Spilled to Local/Remote Storage. Doubling the warehouse size doubles available memory and parallelism.

Multi-cluster scale-out helps when many users are queuing, not when individual queries are slow. Check Queue Time in Query History. Scaling out adds parallel clusters, each serving a different concurrent query.

Clustering keys help when a large table (1 TB+) is scanned with selective WHERE predicates and partition scanned / partition total is high in Query Profile. Defining a clustering key on the filter column co-locates relevant micro-partitions, reducing scan volume dramatically.

Result cache: make your dashboard SQL deterministic and idempotent. Standardize query text across users (no variable timestamps, no per-user LIMIT injections). Each reuse costs zero credits.

Additional optimizations:

Select only needed columns — never SELECT * on wide tables.
Push selective filters early in CTEs; avoid late filtering after expensive joins.
Use UNION ALL instead of UNION when duplicates are already excluded.
Cast VARIANT paths to typed columns to enable statistics-based pruning.
Use SHOW TABLES / INFORMATION_SCHEMA metadata queries instead of scanning data for schema information.

Take quiz

A query runs slowly because it sorts 8 TB of intermediate data and the Query Profile shows heavy spilling. The correct first action is what?Scale up the virtual warehouse to a larger size to get more memory

✓ Correct! Well done.

Add a clustering key on the sort column

✗ Try again.

Enable result caching

✗ Try again.

Add more concurrent users to the query

✗ Try again.

Adding CURRENT_TIMESTAMP() to an otherwise-identical repeated query causes which specific performance regression?It forces a full table scan regardless of clustering

✗ Try again.

It bypasses the virtual warehouse and uses the Cloud Services layer instead

✗ Try again.

It breaks the result cache — the function is non-deterministic so no cached result can be safely reused

✓ Correct! Well done.

It triggers an automatic warehouse resize

✗ Try again.

26. What are Snowflake Streams and how do they implement Change Data Capture (CDC)?

A Snowflake Stream is a named object that records row-level DML changes (INSERT, UPDATE, DELETE) made to a source table since the stream was last consumed. It works like a bookmark: each time you read the stream inside a DML transaction, the bookmark advances, and those changes are removed from the stream. Streams implement Change Data Capture without any external tooling or log-tailing.

Every row returned by a stream includes three metadata columns:

METADATA$ACTION: either 'INSERT' or 'DELETE'. An UPDATE to a row appears as a DELETE of the old version plus an INSERT of the new version — a before/after pair.
METADATA$ISUPDATE: TRUE if this row is the INSERT half of an UPDATE pair (not a net-new insert).
METADATA$ROW_ID: a unique, stable identifier for the underlying row, allowing matching of DELETE/INSERT UPDATE pairs.

Stream types: Standard captures all DML (INSERT + UPDATE + DELETE). Append-only captures only INSERTs, ignoring updates and deletes — more efficient for append-heavy event tables. Insert-only is for External Tables.

-- Create a standard stream on the orders table
CREATE STREAM orders_stream ON TABLE orders;

-- Inspect changes since last consumption
SELECT * FROM orders_stream;

-- Consume the stream: copy inserts into a staging table
INSERT INTO orders_staging
SELECT order_id, customer_id, amount, order_ts
FROM orders_stream
WHERE METADATA$ACTION = 'INSERT'
  AND METADATA$ISUPDATE = FALSE;

-- After the INSERT above commits, the stream advances its offset
SELECT SYSTEM$STREAM_HAS_DATA('orders_stream');

Take quiz

Which metadata column in a Snowflake Stream indicates whether a changed row is the new-value half of an UPDATE operation?METADATA$ACTION

✗ Try again.

METADATA$ROW_ID

✗ Try again.

METADATA$ISUPDATE

✓ Correct! Well done.

METADATA$CHANGE_TYPE

✗ Try again.

Which Snowflake Stream type should you use when your pipeline only needs to react to new INSERT operations and not to UPDATEs or DELETEs?Standard Stream

✗ Try again.

Append-only Stream

✓ Correct! Well done.

Insert-only Stream (for internal tables)

✗ Try again.

Differential Stream

✗ Try again.

27. What are Snowflake Tasks and how do you schedule SQL transformations with them?

A Task is a named Snowflake object that executes a single SQL statement (or a stored procedure call) on a defined schedule. Tasks are the native scheduling mechanism for data transformations inside Snowflake — no external orchestration tool is required for pipelines that can be expressed as SQL.

Schedule options: SCHEDULE = '5 MINUTE' runs the task every 5 minutes. SCHEDULE = 'USING CRON 0 6 * * MON-FRI America/New_York' uses standard cron syntax with a timezone. Tasks default to a user-specified warehouse, but serverless tasks (no warehouse needed) are also supported and billed by compute seconds.

Task DAGs: a root task holds the schedule; child tasks use AFTER <parent_task> to declare dependencies. When the root fires, Snowflake runs the tree in dependency order, fan-out and fan-in patterns included. Tasks are created in SUSPENDED state and must be activated with ALTER TASK ... RESUME.

Combining Streams and Tasks is the idiomatic CDC pipeline: the task body checks SYSTEM$STREAM_HAS_DATA('stream_name') and only proceeds when new changes exist, avoiding empty executions.

-- Root task: runs every 10 minutes on the transform warehouse
CREATE TASK load_staging
  WAREHOUSE = transform_wh
  SCHEDULE  = '10 MINUTE'
AS
  CALL load_staging_proc();

-- Child task: runs after load_staging completes
CREATE TASK build_aggregates
  WAREHOUSE = transform_wh
  AFTER load_staging
AS
  INSERT INTO daily_summary SELECT ...;

-- Both tasks must be resumed before they will fire
ALTER TASK build_aggregates RESUME;
ALTER TASK load_staging     RESUME;

Take quiz

After creating a Snowflake Task, what must you do before it will start executing on schedule?Grant it EXECUTE privilege to the current role

✗ Try again.

Set its status with ALTER TASK ... SET ACTIVE

✗ Try again.

Run ALTER TASK ... RESUME to activate it

✓ Correct! Well done.

Attach it to a running warehouse manually

✗ Try again.

In a Snowflake Task DAG, which task holds the SCHEDULE parameter that drives the entire pipeline?Every task must have its own SCHEDULE

✗ Try again.

Child tasks inherit the schedule from the last task in the chain

✗ Try again.

Only the root (parent) task holds SCHEDULE; child tasks use AFTER clauses

✓ Correct! Well done.

A separate scheduler object holds the SCHEDULE definition

✗ Try again.

28. How does Snowflake implement Role-Based Access Control (RBAC) and what are the system-defined roles?

In Snowflake's RBAC model, privileges are granted to roles, and roles are granted to users or other roles (creating a role hierarchy). A user acquires the union of all privileges from every role in their active role tree. At any moment, a session runs under one active role; the user can switch with USE ROLE.

Five system-defined roles exist in every account and cannot be dropped:

ACCOUNTADMIN — highest privilege; manages billing, account parameters, all objects. Should be used sparingly and with MFA.
SYSADMIN — manages all databases, schemas, warehouses, and objects. The recommended role for day-to-day data engineering work.
SECURITYADMIN — creates and manages users and roles; grants and revokes privileges. Cannot read data.
USERADMIN — creates users and roles (a subset of SECURITYADMIN's scope).
PUBLIC — automatically granted to every user; assign here only objects you want universally accessible.

Best practice: create custom roles and grant them up to SYSADMIN so ACCOUNTADMIN retains full visibility through inheritance. Never do daily work as ACCOUNTADMIN.

-- Grant SELECT on a table to a custom role
GRANT SELECT ON TABLE sales.orders TO ROLE analyst_role;

-- Grant the warehouse to the role
GRANT USAGE ON WAREHOUSE reporting_wh TO ROLE analyst_role;

-- Assign the custom role to a user
GRANT ROLE analyst_role TO USER alice;

-- Grant the custom role UP to SYSADMIN for visibility
GRANT ROLE analyst_role TO ROLE SYSADMIN;

-- Inspect what privileges a role holds
SHOW GRANTS TO ROLE analyst_role;

Take quiz

Which system-defined Snowflake role is the recommended choice for day-to-day warehouse and database management?SYSADMIN

✓ Correct! Well done.

ACCOUNTADMIN

✗ Try again.

SECURITYADMIN

✗ Try again.

PUBLIC

✗ Try again.

When you create a custom role in Snowflake, to which system role should it be granted so ACCOUNTADMIN retains visibility through role inheritance?PUBLIC

✗ Try again.

SYSADMIN

✓ Correct! Well done.

SECURITYADMIN

✗ Try again.

USERADMIN

✗ Try again.

29. What is column-level security in Snowflake (Dynamic Data Masking and Column-level Security policies)?

Column-level security in Snowflake is implemented through Dynamic Data Masking (DDM) policies. A masking policy defines a SQL expression — typically a CASE statement on CURRENT_ROLE() or IS_ROLE_IN_SESSION() — that returns either the real column value or a masked substitute depending on who is querying. The masking is applied at query runtime; the original data is always stored unmasked in storage.

Key properties: the same policy can be applied to many columns; the policy can be changed or replaced without touching the table DDL; a privileged role always sees the real value; less-privileged roles see a masked version (e.g., ****, NULL, a truncated value, or a hash). Policy application is transparent to the user — their SQL is unchanged, they simply receive different output.

A Conditional Masking Policy can additionally inspect other column values in the same row when deciding how to mask (e.g., only mask the SSN column if the is_pii flag is true). This makes the policies context-sensitive.

-- Create a masking policy that hides emails from non-privileged roles
CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING ->
  CASE
    WHEN IS_ROLE_IN_SESSION('PII_VIEWER') THEN val
    ELSE REGEXP_REPLACE(val, '.+@', '****@')
  END;

-- Apply the policy to the email column
ALTER TABLE customers
  MODIFY COLUMN email
  SET MASKING POLICY email_mask;

-- Analyst (no PII_VIEWER role) sees: ****@example.com
-- PII_VIEWER role sees: alice@example.com

Take quiz

At what point does Snowflake's Dynamic Data Masking apply the masking expression to column data?When the data is written/loaded into the table

✗ Try again.

At midnight during a nightly masking job

✗ Try again.

At query runtime — the original data is never modified in storage

✓ Correct! Well done.

When a new session connects to the warehouse

✗ Try again.

Which SQL command attaches an existing masking policy to a specific table column?CREATE COLUMN POLICY ... ON TABLE

✗ Try again.

APPLY MASK email_mask TO customers.email

✗ Try again.

ALTER TABLE customers MODIFY COLUMN email SET MASKING POLICY email_mask

✓ Correct! Well done.

GRANT MASKING POLICY email_mask TO COLUMN customers.email

✗ Try again.

30. What is Row Access Policy in Snowflake and how does it implement row-level security?

A Row Access Policy (RAP) is a Snowflake security object that transparently injects an additional filter predicate into every query that touches the protected table. From the querying user's perspective, their SQL is unchanged — they simply receive fewer rows. The policy's logic determines which rows are visible based on the current user's identity or role.

The policy body is a SQL Boolean expression referencing CURRENT_USER(), CURRENT_ROLE(), or IS_ROLE_IN_SESSION(). A common pattern uses a mapping table: the policy joins the mapping table to determine which regions or tenants the current role may see, then returns only those rows.

Multiple policies can be applied to a single table; all must return TRUE for a row to be visible (AND logic). Policies can also be applied to views. Row Access Policies are evaluated after column-level masking policies.

-- Mapping table: role -> allowed region
CREATE TABLE region_access_map (
  role_name VARCHAR,
  region    VARCHAR
);
INSERT INTO region_access_map VALUES
  ('EMEA_ANALYST', 'EMEA'),
  ('APAC_ANALYST', 'APAC'),
  ('GLOBAL_ANALYST', 'GLOBAL');

-- Row Access Policy: show all rows to GLOBAL_ANALYST,
-- otherwise filter by the mapped region
CREATE ROW ACCESS POLICY region_rap AS (row_region VARCHAR) RETURNS BOOLEAN ->
  EXISTS (
    SELECT 1 FROM region_access_map
    WHERE role_name = CURRENT_ROLE()
      AND (region = 'GLOBAL' OR region = row_region)
  );

-- Attach the policy to the orders table on the region column
ALTER TABLE orders
  ADD ROW ACCESS POLICY region_rap ON (region);

Take quiz

How does a Snowflake Row Access Policy restrict which rows a user sees from the user's SQL perspective?It returns an error if the user queries a restricted row

✗ Try again.

It rewrites the user's SELECT to a stored procedure

✗ Try again.

It is applied transparently as an additional WHERE predicate — the user's SQL is unchanged

✓ Correct! Well done.

It creates a secure view that replaces the table

✗ Try again.

Which Snowflake context function is commonly used inside a Row Access Policy to identify the current session's active role?SESSION_ROLE()

✗ Try again.

CURRENT_ROLE()

✓ Correct! Well done.

GET_ACTIVE_ROLE()

✗ Try again.

USER_ROLE()

✗ Try again.

31. How does Snowflake encrypt data at rest and in transit?

Snowflake enforces encryption universally — it cannot be disabled. Every byte of customer data is always encrypted, both when stored in cloud object storage and when moving across networks.

Encryption at rest uses AES-256-GCM in a four-tier hierarchical key model:

File keys encrypt individual micro-partition files.
Table keys encrypt the file keys for a given table.
Account master keys encrypt the table keys for an account.
Root keys (Snowflake-held) encrypt the account master keys.

Key rotation happens automatically every 12 months per Snowflake's key management policy. Customers can also request on-demand rekeying via ALTER TABLE ... REKEYING = TRUE (Enterprise+) to force immediate rotation of table-level keys. The Tri-Secret Secure model (Business Critical+) adds a customer-managed key on top of this hierarchy.

Encryption in transit uses TLS 1.2 or higher for all connections: between client drivers (JDBC, ODBC, Python) and the Snowflake service endpoint, and between Snowflake's internal services and cloud object storage. Certificate pinning and OCSP validation are used to prevent man-in-the-middle attacks.

Customers in regulated industries should note that Snowflake's encryption is not customer-managed by default; for that, they need Business Critical edition and Tri-Secret Secure.

Take quiz

What encryption algorithm does Snowflake use to protect data at rest in cloud object storage?RSA-2048

✗ Try again.

AES-256-GCM

✓ Correct! Well done.

DES-3

✗ Try again.

ChaCha20-Poly1305

✗ Try again.

How often does Snowflake automatically rotate encryption keys by default?Every 30 days

✗ Try again.

Every 90 days

✗ Try again.

Annually (every 12 months), with on-demand rekeying also available

✓ Correct! Well done.

Never — customers must request rotation manually

✗ Try again.

32. What is Snowflake's Tri-Secret Secure model and when is it used?

Tri-Secret Secure is Snowflake's highest tier of data protection. It ensures that data can only be decrypted when two independent keys are simultaneously present: Snowflake's own master key and a Customer-Managed Key (CMK) held in the customer's own cloud KMS (AWS KMS or Azure Key Vault). Neither party can decrypt the data alone.

In the standard key model, Snowflake holds all key hierarchy tiers. In Tri-Secret Secure, the account master key is wrapped not just by Snowflake's root key but also by the customer's CMK. Snowflake must call the customer's KMS every time it needs to access encrypted data. If the customer revokes the CMK, all data in that account becomes immediately inaccessible — even to Snowflake engineers. This gives customers provable data sovereignty.

Use cases: financial institutions with regulatory mandates, government agencies, healthcare organizations requiring proof they can revoke Snowflake's access at any time. Requires Business Critical or Virtual Private Snowflake edition plus a Tri-Secret Secure subscription add-on.

Operational risk: if the customer loses or deletes the CMK without a backup, all data is permanently lost. The customer bears full responsibility for CMK lifecycle management.

-- Conceptual setup: link an Azure Key Vault key to the Snowflake account
-- (actual steps involve Snowflake Support + Azure Key Vault configuration)
ALTER ACCOUNT SET HSM_MASTER_KEY_ID = 'https://myvault.vault.azure.net/keys/mykey/version';

-- Verify Tri-Secret Secure is active
SHOW PARAMETERS LIKE 'TRI_SECRET_SECURE' IN ACCOUNT;

Take quiz

In Tri-Secret Secure, if a customer revokes their Customer-Managed Key, what happens to their Snowflake data?Data is moved to Fail-safe for 7 days

✗ Try again.

Snowflake can still decrypt data using its own root key

✗ Try again.

All data immediately becomes inaccessible — neither the customer nor Snowflake can decrypt it

✓ Correct! Well done.

Data is exported to the customer's KMS automatically

✗ Try again.

Which minimum Snowflake edition is required to enable Tri-Secret Secure?Standard

✗ Try again.

Business Critical

✓ Correct! Well done.

Enterprise

✗ Try again.

Any edition with a security add-on

✗ Try again.

33. What are Snowflake Object Tags and Data Classification and how do they support governance?

Object Tags are key-value string metadata labels that can be attached to any Snowflake object — accounts, databases, schemas, tables, columns, warehouses, or other named resources. Tags give governance teams a structured way to label assets by sensitivity level, data domain, cost center, or regulatory category without changing the underlying schema.

Tag lineage is the automatic propagation behavior: a tag set on a schema propagates to all tables in that schema; a tag set on a table propagates to all its columns. Tags can be overridden at a lower level. This means you can mark an entire schema as sensitivity=high and every column inherits that classification automatically, saving thousands of individual tag assignments.

Data Classification uses Snowflake's SYSTEM$CLASSIFY() function (or the automated classification in Snowsight) to automatically inspect column names and sample values, then suggest PII categories: personal name, email address, phone number, SSN, date of birth, credit card number, IP address, and others. You can accept the suggestions and apply them as tags.

Governance views: SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES shows every tag assignment across the account. Combined with ACCESS_HISTORY, you can produce reports like 'who queried tagged PII columns in the last 30 days'.

-- Create a tag
CREATE TAG sensitivity ALLOWED_VALUES 'high', 'medium', 'low';

-- Apply a tag to a table (propagates to all columns)
ALTER TABLE customers SET TAG sensitivity = 'high';

-- Apply a different tag directly to one column
ALTER TABLE customers MODIFY COLUMN ssn SET TAG sensitivity = 'high';

-- Run automated classification on a table
SELECT SYSTEM$CLASSIFY('mydb.myschema.customers', {'auto_tag': true});

Take quiz

What is tag lineage in Snowflake's Object Tagging system?Tags are versioned and each change creates a new lineage entry

✗ Try again.

A tag set on a parent object automatically propagates to all its child objects unless overridden

✓ Correct! Well done.

Tags are inherited only from column to column within the same table

✗ Try again.

Lineage refers to the audit history of who set the tag

✗ Try again.

Which Snowflake system function automatically detects PII categories by inspecting column names and sample data?DETECT_PII()

✗ Try again.

SCAN_FOR_SENSITIVE_DATA()

✗ Try again.

AUTO_CLASSIFY()

✗ Try again.

SYSTEM$CLASSIFY()

✓ Correct! Well done.

34. What is the Snowflake Access History feature and how does it support audit and compliance?

Access History records every query that accessed specific database objects — tracking exactly which tables and columns were read or written, by which user, running which role, at what time. It is the foundation for data lineage auditing, regulatory compliance (GDPR, CCPA, HIPAA), and detecting over-privileged accounts.

The primary view is SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY. Key columns include:

QUERY_ID and QUERY_START_TIME — the query being audited.
USER_NAME and ROLE_NAME — who ran the query.
DIRECT_OBJECTS_ACCESSED — tables/views the SQL explicitly named.
BASE_OBJECTS_ACCESSED — the underlying base tables behind any views, including column-level granularity.
OBJECTS_MODIFIED — tables written to via DML/COPY.

Access History has a latency of up to 3 hours before records appear in the ACCOUNT_USAGE views. For compliance-sensitive queries, this means near-real-time auditing is not possible through this view alone — pair it with query history and login history for a fuller picture.

Practical use: run a weekly query across ACCESS_HISTORY and TAG_REFERENCES to produce a report of which users accessed columns tagged as sensitivity=high, satisfying GDPR Article 30 record-keeping requirements.

Take quiz

Which Snowflake view provides the Access History records showing exactly which tables and columns were accessed per query?SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY

✓ Correct! Well done.

SNOWFLAKE.INFORMATION_SCHEMA.OBJECT_PRIVILEGES

✗ Try again.

SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY

✗ Try again.

SNOWFLAKE.ACCOUNT_USAGE.COLUMNS_ACCESSED

✗ Try again.

What is the approximate latency for new query records to become visible in SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY?Immediate (real-time)

✗ Try again.

Up to 3 hours

✓ Correct! Well done.

24 hours

✗ Try again.

48 hours for DML, 1 hour for SELECT

✗ Try again.

35. What is Snowflake Secure Data Sharing and how does it work without copying data?

Secure Data Sharing lets a Snowflake account (the data provider) grant another Snowflake account (the data consumer) live read access to specific objects — without ever copying, moving, or exporting the data. The consumer's Virtual Warehouse reads directly from the provider's cloud object storage using a metadata pointer. From the consumer's perspective, the shared data appears as a regular read-only database.

The Share object is the key primitive: the provider adds objects (tables, external tables, secure views, secure materialized views, secure UDFs) to a Share, then grants the Share to one or more consumer accounts. The Share is a metadata reference — it contains zero data bytes.

Two consumer types exist:

Full Snowflake account — an existing Snowflake customer; uses their own Virtual Warehouse to query the share. Zero extra cost to the provider for compute.
Reader Account — a free Snowflake account created and managed by the provider for consumers who do not have their own Snowflake subscription. The provider pays for the reader account's compute.

Limitation: cross-region or cross-cloud sharing requires the provider to first replicate data to a local account in the consumer's region before sharing. Shared data cannot be further re-shared by consumers by default.

Take quiz

When a consumer accesses a Snowflake Data Share, where is the actual data physically read from?A copy of the data exported to the consumer's cloud storage

✗ Try again.

The provider's Snowflake storage — no data is ever copied to the consumer

✓ Correct! Well done.

A neutral third-party escrow account

✗ Try again.

The Snowflake Marketplace central cache

✗ Try again.

What type of consumer account allows sharing data with an organization that does not have its own Snowflake subscription?External Stage account

✗ Try again.

Guest consumer account

✗ Try again.

Reader Account (created and managed by the data provider)

✓ Correct! Well done.

Public listing account

✗ Try again.

36. What is the Snowflake Data Marketplace and what types of data products are available?

The Snowflake Data Marketplace is a hub within Snowflake where data providers publish and monetize data products, and data consumers discover and access them — all powered by Secure Data Sharing. When a consumer subscribes to a listing, they receive a live Share pointing to the provider's data, not a static file download. The data is always current, and no ETL pipeline is needed.

Three listing types are offered:

Free listings — immediately accessible to any Snowflake customer in the same region as the provider. Common for sample datasets, open government data, and marketing datasets.
Paid listings — require a commercial agreement brokered through Snowflake's billing system. The provider sets a subscription price and Snowflake handles invoicing.
Personalized listings — the consumer submits a request describing their data need; the provider manually reviews it, customizes the share, and approves. Used for proprietary or compliance-sensitive data.

Data categories available: financial markets (equities, FX, options), weather and climate, demographics and consumer behavior, ESG and sustainability data, geospatial and location data, healthcare and clinical trials, and firmographic business data. Major providers include Bloomberg, Refinitiv, AccuWeather, Foursquare, SafeGraph, and dozens of government agencies publishing open data.

Take quiz

How does a Snowflake Marketplace consumer receive access to a listed dataset?Through a live Snowflake Data Share — no download, export, or ETL pipeline required

✓ Correct! Well done.

Via an automated S3 export to their own bucket

✗ Try again.

By downloading a static Parquet file from the Marketplace CDN

✗ Try again.

Through a JDBC connection to the provider's Snowflake account

✗ Try again.

Which Marketplace listing type requires the provider to manually review and approve the consumer's data request before granting access?Free listing

✗ Try again.

Paid listing

✗ Try again.

Personalized listing

✓ Correct! Well done.

Premium listing

✗ Try again.

37. What are Snowflake Data Clean Rooms and what privacy problems do they solve?

Data Clean Rooms (DCRs) are privacy-preserving environments where two or more parties can run collaborative analytics on a union of their datasets without either party being able to see the other's raw records. Snowflake implements Clean Rooms on top of Secure Data Sharing and the Native App Framework.

The privacy problem they solve: an advertiser wants to measure how many of their customers saw a publisher's ad and subsequently made a purchase. This requires matching two customer lists. But the advertiser cannot share its customer list with the publisher and vice versa — both contain sensitive PII. A DCR lets them run the overlap-and-attribution query without either side ever getting a plain-text view of the other's underlying customer records.

Snowflake's Clean Room architecture enforces privacy through:

Pre-approved query templates — only specific, vetted SQL queries may be executed; no ad-hoc access.
Aggregation thresholds — results are suppressed if the matching cohort is smaller than a minimum size (e.g., <500 rows), preventing re-identification.
No raw data export — results are returned as aggregates; the underlying rows are never exposed.
Differential privacy options — statistical noise can be added to outputs to provide provable privacy guarantees.

Use cases span advertising (media measurement, audience activation), financial services (transaction reconciliation between banks), and healthcare (de-identified outcomes research across payer and provider datasets).

Take quiz

What fundamental privacy problem do Snowflake Data Clean Rooms address?Encrypting data before it is stored in Snowflake

✗ Try again.

Enabling collaborative analytics on combined datasets without either party seeing the other's raw records

✓ Correct! Well done.

Replacing row-level security for multi-tenant architectures

✗ Try again.

Anonymizing data exports before they leave Snowflake

✗ Try again.

Which two Snowflake platform capabilities form the technical foundation of Snowflake Data Clean Rooms?Snowpark and Materialized Views

✗ Try again.

Time Travel and Fail-safe

✗ Try again.

Secure Data Sharing and the Native App Framework

✓ Correct! Well done.

Dynamic Tables and Streams

✗ Try again.

38. What is Snowpark and how does it allow Python/Java/Scala code to run inside Snowflake?

Snowpark is Snowflake's developer framework that lets you write Python, Java, or Scala code that executes inside the Snowflake engine rather than pulling data to an external machine. The core abstraction is the Snowpark DataFrame, which works like Apache Spark or pandas DataFrames: you compose transformations lazily, and Snowflake compiles them into SQL and executes them on the Virtual Warehouse.

The primary benefit is that data never leaves Snowflake. Instead of pulling 100 GB to a Python server, applying transformations, and pushing results back, the transformation runs where the data lives — in Snowflake compute. This eliminates egress costs, reduces latency, and scales with warehouse size.

Deployment targets in Snowpark:

Snowpark Stored Procedures — Python/Java/Scala logic callable via CALL proc_name(), running on the caller's warehouse.
User-Defined Functions (UDFs) — scalar or tabular functions callable from any SQL query.
User-Defined Aggregate Functions (UDAFs) — custom aggregations in Python/Java.
Snowpark ML — train and deploy scikit-learn, XGBoost, LightGBM models entirely inside Snowflake.
Snowpark Container Services — run arbitrary Docker containers within Snowflake's network for full-custom workloads.

# Python Snowpark: run a transformation inside Snowflake
from snowflake.snowpark import Session

session = Session.builder.configs(connection_params).create()

# Build a lazy DataFrame — nothing runs yet
df = session.table('raw_events')

df_clean = (
    df.filter(df['event_type'] == 'purchase')
      .select('user_id', 'amount', 'event_ts')
      .group_by('user_id')
      .agg({'amount': 'sum'})
)

# Action: push result into Snowflake -- runs as SQL on the warehouse
df_clean.write.save_as_table('user_purchase_summary', mode='overwrite')

Take quiz

What is the primary advantage of transforming data with Snowpark rather than pulling it to an external Python environment?Processing occurs inside Snowflake compute — no data egress, no latency of data movement, no egress cost

✓ Correct! Well done.

Snowpark executes faster than Snowflake SQL

✗ Try again.

Snowpark bypasses the Virtual Warehouse for free

✗ Try again.

Snowpark supports more SQL functions than native Snowflake SQL

✗ Try again.

What does the Snowpark DataFrame API use to represent transformations before actually executing them against Snowflake?An eager execution engine that runs each step immediately

✗ Try again.

A lazy execution plan compiled to SQL and executed on the Virtual Warehouse when an action is triggered

✓ Correct! Well done.

A local pandas DataFrame that is uploaded on completion

✗ Try again.

A Python AST that runs in the Cloud Services layer

✗ Try again.

39. What are Snowflake Native Apps and how does the Native App Framework work?

Snowflake Native Apps are full applications that a provider packages once and consumers install directly into their own Snowflake account. The Native App Framework is the platform layer that enables this — combining Snowpark code, data shares, Streamlit UI components, stored procedures, and UDFs into a versioned, deployable bundle called an Application Package.

The key architectural difference from traditional SaaS is where code and data meet: in conventional SaaS the customer's data is sent to the vendor's environment (requiring data egress and trust). In a Native App, the application is brought to the customer's data — the app code runs in the consumer's Snowflake account, using the consumer's compute, with no data leaving the consumer's environment.

Lifecycle: the provider develops the app locally, creates an Application Package, uploads it to the Marketplace or shares a private URL. Consumers find the app in their Snowflake Marketplace, click install, grant the requested privileges, and the app runs inside their account. The provider receives telemetry (if the consumer consents) but never sees the consumer's data.

Use cases: data enrichment APIs (address validation, geocoding), compliance audit tools that run inside a regulated environment, ML inference apps that score your own data, analytics dashboards with managed SQL logic.

Take quiz

Where does a Snowflake Native App's code execute after the consumer installs it?In the provider's Snowflake account

✗ Try again.

On external compute managed by the provider

✗ Try again.

Inside the consumer's own Snowflake account using the consumer's compute

✓ Correct! Well done.

In a shared neutral sandbox environment

✗ Try again.

What is the fundamental architectural difference between a Snowflake Native App and a traditional SaaS product?Native Apps are cheaper to run

✗ Try again.

The Native App runs inside the consumer's account — the consumer's data never leaves their environment

✓ Correct! Well done.

Native Apps require no licensing

✗ Try again.

Traditional SaaS apps run on Snowflake infrastructure

✗ Try again.

40. What are Snowflake External Tables and when would you use them over internal tables?

An External Table is a read-only Snowflake object that maps a schema onto files stored in an external stage (S3, Azure Blob, GCS) without loading the data into Snowflake-managed storage. The files stay in your own cloud bucket; Snowflake queries them in place by scanning the raw files through the Virtual Warehouse.

External Tables support partitioning based on file path expressions (e.g., year/month/day directory structure) which allows partition pruning similar to Hive-style partitioning. The partition metadata is maintained in a Snowflake-managed catalog and must be kept current with ALTER EXTERNAL TABLE ... REFRESH or automated via cloud event notifications.

Key limitations compared to internal tables: read-only (no DML), no Time Travel, no Fail-safe, no automatic micro-partition optimization, slower query performance since data is not stored in Snowflake's columnar compressed format, and schema must be explicitly defined via the VALUE:field::TYPE VARIANT path syntax.

When to use: querying a data lake without ingesting it (cost-sensitive archival data), sharing data already in S3/Azure Blob with Snowflake analysts, compliance scenarios where data must remain in a specific bucket, or bridging a data lake with Snowflake analytics without full migration.

-- Create an external table on an S3 stage
CREATE EXTERNAL TABLE ext_clickstream (
  event_ts   TIMESTAMP_NTZ AS (VALUE:c1::TIMESTAMP_NTZ),
  user_id    VARCHAR       AS (VALUE:c2::VARCHAR),
  event_type VARCHAR       AS (VALUE:c3::VARCHAR)
)
PARTITION BY (TO_DATE(event_ts))
LOCATION = @my_s3_stage/clickstream/
AUTO_REFRESH = TRUE
FILE_FORMAT = (TYPE = 'PARQUET');

-- Refresh metadata after new files arrive (if not using AUTO_REFRESH)
ALTER EXTERNAL TABLE ext_clickstream REFRESH;

Take quiz

What is the main performance disadvantage of querying Snowflake External Tables compared to internal tables?External Tables do not support SQL JOINs

✗ Try again.

External Tables charge 10x more credits per query

✗ Try again.

External Tables lack Snowflake's columnar compression and micro-partition metadata optimization, making scans slower

✓ Correct! Well done.

External Tables cannot use Virtual Warehouses

✗ Try again.

Which SQL command updates an External Table's partition metadata to reflect newly arrived files in the external stage?ALTER EXTERNAL TABLE ... REFRESH

✓ Correct! Well done.

RELOAD EXTERNAL TABLE

✗ Try again.

UPDATE EXTERNAL TABLE ... SYNC FILES

✗ Try again.

COPY METADATA FROM STAGE

✗ Try again.

41. What is Snowflake's multi-cluster warehouse and how does it handle concurrency auto-scaling?

A multi-cluster warehouse is a Virtual Warehouse configured with a minimum and maximum cluster count. Instead of a single MPP cluster, Snowflake can dynamically spin up additional identical clusters to serve queued queries in parallel, eliminating the most common cause of poor concurrency performance: the query queue.

Two operating modes:

Auto-scale mode (default) — Snowflake starts a new cluster only when the existing clusters have a queue forming. Once the queue clears and the added cluster has been idle for the Auto-Suspend period, it shuts down. This is credit-efficient for variable concurrency patterns.
Maximized mode — all clusters in the range are always running, regardless of load. Use this when you have consistently high, predictable concurrency and cannot afford any queue latency.

Two scaling policies control how aggressively new clusters start:

STANDARD (default) — starts a new cluster as soon as a single query is queued.
ECONOMY — waits until at least 6 minutes of queue time accumulates before starting a new cluster. Saves credits at the cost of higher queue latency for occasional spikes.

Requires Enterprise edition. Each additional cluster counts as a full warehouse credit spend at the configured T-shirt size.

CREATE WAREHOUSE concurrent_wh
  WAREHOUSE_SIZE         = 'LARGE'
  MIN_CLUSTER_COUNT      = 1
  MAX_CLUSTER_COUNT      = 5
  SCALING_POLICY         = 'STANDARD'
  AUTO_SUSPEND           = 120
  AUTO_RESUME            = TRUE;

Take quiz

What is the key difference between Auto-scale and Maximized mode in a multi-cluster warehouse?Auto-scale only runs when the warehouse is explicitly started

✗ Try again.

Auto-scale adds clusters only when a queue forms; Maximized always runs all configured clusters

✓ Correct! Well done.

Maximized mode uses serverless compute; Auto-scale uses dedicated clusters

✗ Try again.

They differ only in the T-shirt size used

✗ Try again.

Multi-cluster warehouses are available from which minimum Snowflake edition?Standard

✗ Try again.

Business Critical

✗ Try again.

All editions support multi-cluster

✗ Try again.

Enterprise edition

✓ Correct! Well done.

42. What is Resource Monitor in Snowflake and how do you use it to control credit consumption?

A Resource Monitor is a named Snowflake object that watches credit consumption for one or more Virtual Warehouses (or the entire account) and automatically takes action when configurable thresholds are crossed. It is the primary guardrail against runaway warehouse spend — for example, preventing a poorly-written query or a forgotten warehouse from consuming an entire month's budget.

A Resource Monitor defines a credit quota (the total credits allowed within the reset period) and one or more threshold triggers (expressed as percentages of the quota). Each trigger has an action:

NOTIFY — sends an email alert to the monitor's notification list but takes no action against the warehouse.
SUSPEND — suspends the warehouse after all currently running queries complete.
SUSPEND_IMMEDIATE — immediately kills all running queries and suspends the warehouse without waiting.

Reset schedule: DAILY, WEEKLY, MONTHLY, or YEARLY — the quota resets on this cadence. Monitors can be set at account level (applies to all warehouses) or assigned to individual warehouses.

-- Create a Resource Monitor: 500 credits/month, suspend at 90%, alert at 75%
CREATE RESOURCE MONITOR monthly_budget
  CREDIT_QUOTA = 500
  FREQUENCY    = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO SUSPEND
    ON 100 PERCENT DO SUSPEND_IMMEDIATE;

-- Attach the monitor to a specific warehouse
ALTER WAREHOUSE reporting_wh SET RESOURCE_MONITOR = monthly_budget;

Take quiz

What does the SUSPEND_IMMEDIATE action on a Resource Monitor threshold do to the warehouse?Immediately kills all running queries and suspends the warehouse

✓ Correct! Well done.

Waits for running queries to finish before suspending

✗ Try again.

Sends a notification but does not stop the warehouse

✗ Try again.

Reduces the warehouse size to X-Small to limit spend

✗ Try again.

At which two levels can a Snowflake Resource Monitor be applied?Schema level and table level

✗ Try again.

Session level and query level

✗ Try again.

Account level (all warehouses) and individual warehouse level

✓ Correct! Well done.

Database level and schema level

✗ Try again.

43. How does Snowflake support ELT patterns and how does it compare to ETL?

ELT (Extract, Load, Transform) loads raw data into Snowflake first, then transforms it inside using SQL or Snowpark. ETL (Extract, Transform, Load) transforms data in an external tool before loading it. Snowflake's MPP engine and separation-of-storage-from-compute make it strongly suited for ELT — the transformation step is where Snowflake's power is applied, not a place to avoid.

ELT advantages with Snowflake: no external transformation infrastructure to manage; transformations can access the full history of raw data at any time (Time Travel); warehouse can be sized specifically for heavy transform runs; SQL models are easy to version-control with dbt; Snowpark handles transformations that require Python or ML libraries.

Typical Snowflake ELT stack:

Ingestion: Fivetran, Airbyte, or Snowpipe pulls data into a raw schema.
Staging: dbt or Snowpark cleans and casts the raw data into typed staging tables.
Transformation: dbt models join, aggregate, and enrich staging data into semantic layer tables.
Serving: BI tools (Tableau, Looker, Power BI) query the semantic layer through a dedicated reporting warehouse.

-- ELT pattern: transform raw JSON into a typed table inside Snowflake
CREATE OR REPLACE TABLE orders_clean AS
SELECT
  raw:order_id::NUMBER        AS order_id,
  raw:customer_id::VARCHAR    AS customer_id,
  raw:amount::DECIMAL(10,2)   AS amount,
  raw:order_ts::TIMESTAMP_NTZ AS order_ts
FROM raw.orders_raw
WHERE raw:status::VARCHAR = 'completed';

Take quiz

Why does Snowflake's architecture particularly favor ELT over traditional ETL?Snowflake prohibits loading untransformed data

✗ Try again.

ELT keeps transformation inside Snowflake's MPP engine, eliminating data movement and external infrastructure

✓ Correct! Well done.

ETL is slower only for small datasets

✗ Try again.

Snowflake cannot connect to ETL tools via JDBC

✗ Try again.

Which open-source transformation framework, built around SQL models, is most widely used for ELT pipelines in Snowflake?Apache Spark

✗ Try again.

Airflow

✗ Try again.

Talend

✗ Try again.

dbt (data build tool)

✓ Correct! Well done.

44. What are common Snowflake anti-patterns and performance pitfalls to avoid?

Anti-patterns in Snowflake fall into four categories: security misuse, cost mismanagement, query performance degradation, and data loading inefficiency.

Common Snowflake Anti-Patterns
Anti-Pattern	Root Cause	Fix
Using ACCOUNTADMIN for daily work	Violates least-privilege; accidental privilege grants	Create custom roles; use SYSADMIN for object management
Never suspending idle warehouses	Compute bills tick even with zero queries	Set AUTO_SUSPEND = 60 seconds on all non-critical warehouses
High partition scan ratio on large tables	No clustering; data not co-located with query predicates	Define a clustering key on frequently filtered columns
Loading thousands of tiny files with COPY INTO	Small-file overhead: metadata cost > data cost	Merge files to 100–250 MB before loading; use Snowpipe for micro-batches
Breaking the result cache	Non-deterministic functions (CURRENT_TIMESTAMP, RANDOM) in repeated queries	Standardize query SQL; move timestamp filters to parameters
Storing all text as VARCHAR	No column statistics for numerics or dates; worse compression	Use typed columns (NUMBER, DATE, TIMESTAMP_NTZ)
UNION instead of UNION ALL	Implicit deduplication scan on large result sets	Use UNION ALL when input rows are already distinct
SELECT * on wide tables	Reads all columns; wastes I/O and memory on irrelevant data	Always specify required columns in SELECT list

The single highest-impact change for most Snowflake accounts: set AUTO_SUSPEND = 60 on every warehouse that is not required to be always-on, and audit warehouses weekly in SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY for unexpected runtime.

Take quiz

How does adding LIMIT 10 to an otherwise-identical repeated query affect Snowflake's result cache?It has no effect; LIMIT is evaluated after cache lookup

✗ Try again.

It improves performance by returning fewer rows

✗ Try again.

It breaks the result cache — the SQL text differs so no cached result is reused, forcing a full warehouse execution

✓ Correct! Well done.

It triggers Snowflake to create a partial result cache

✗ Try again.

What is the 'small file problem' in Snowflake COPY INTO and which ingestion mechanism mitigates it?Loading files larger than 1 GB creates memory pressure; use Snowpipe for large files

✗ Try again.

Loading thousands of tiny files creates high per-file metadata overhead; Snowpipe batches files automatically before loading

✓ Correct! Well done.

Files with special characters cause parse errors; use Snowpipe for automatic escaping

✗ Try again.

Compressed files cause decompression bottlenecks; Snowpipe decompresses in parallel

✗ Try again.

45. How does Snowflake compare to BigQuery and Redshift in architecture and pricing model?

These three platforms dominate cloud data warehousing and each represents a distinct architectural philosophy.

Snowflake uses multi-cluster shared-data architecture: explicit Virtual Warehouses (MPP clusters) read from cloud object storage. Available on AWS, Azure, and GCP. Compute billed per credit-second while running; storage billed per TB/month compressed. Customers manage warehouse sizing and concurrency configuration.

BigQuery (Google Cloud only) is fully serverless — there are no clusters or warehouses to provision. The Dremel engine allocates compute slots automatically per query. Default pricing: on-demand (per TB of data scanned), or capacity pricing (flat-rate slot reservations). BigQuery abstracts all infrastructure; you just run SQL.

Amazon Redshift uses shared-nothing node clusters. RA3 nodes (modern Redshift) separate storage to S3 and compute to nodes, partially approaching Snowflake's model. Billed per node-hour (provisioned) or per RPU-second (Redshift Serverless). Deeper AWS service integration. Requires cluster sizing decisions (node type, count) and maintenance (vacuuming, sort/dist key tuning).

Snowflake vs BigQuery vs Redshift
Dimension	Snowflake	BigQuery	Redshift
Architecture	Multi-cluster shared-data	Serverless (Dremel)	Shared-nothing cluster (RA3)
Cloud platforms	AWS, Azure, GCP	GCP only	AWS only
Compute model	Virtual Warehouses (user-managed)	Fully serverless, auto-slot allocation	Provisioned nodes or Serverless RPU
Compute billing	Credits/second while warehouse runs	Per TB scanned (on-demand) or slots	Node-hours or RPU-seconds
Storage billing	Per TB/month compressed	Per TB/month in GCS	Per TB/month in S3 (RA3)
Idle compute cost	Zero when warehouse suspended	Zero (serverless)	Full node-hour cost if always-on
Cross-cloud support	Yes (replicate across clouds)	No	No
Semi-structured data	Native VARIANT type	Native JSON/ARRAY types	SUPER type (limited)

Take quiz

Which cloud data warehouse platform charges users based on bytes of data scanned per query in its default on-demand pricing mode?Snowflake

✗ Try again.

BigQuery

✓ Correct! Well done.

Amazon Redshift

✗ Try again.

All three use the same scan-based model

✗ Try again.

Which platform requires you to choose a node type and cluster node count rather than a warehouse T-shirt size?Snowflake

✗ Try again.

BigQuery

✗ Try again.

Amazon Redshift (provisioned cluster)

✓ Correct! Well done.

Both Snowflake and Redshift

✗ Try again.

CouchDB Interview Questions

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.

Database / Snowflake Interview Questions

1. What is Snowflake and how does its multi-cluster, shared-data architecture differ from traditional data warehouses?

2. What are the three layers of Snowflake's architecture (Storage, Compute, Cloud Services) and what does each do?

3. What is a Virtual Warehouse in Snowflake and how does it scale independently of storage?

4. What is the difference between Snowflake's columnar storage and row-based storage in traditional RDBMS?

5. What is the Snowflake Cloud Services layer and what components does it manage?

6. What are micro-partitions in Snowflake and how do they enable automatic data clustering?

7. What is data clustering in Snowflake and when should you define a clustering key?

8. What is the difference between Snowflake Standard, Enterprise, Business Critical, and Virtual Private Snowflake edition?

9. How does Snowflake handle multi-cloud deployment and cross-cloud replication?

10. What is Snowflake's separation of compute from storage and what billing advantages does it provide?

11. What are the stages in Snowflake (internal vs external) and how do you use them for data loading?

12. How does the COPY INTO command work and what file formats does it support?

13. What is Snowpipe and how does it enable continuous / serverless data ingestion?

14. What is the difference between bulk loading with COPY INTO and micro-batch loading with Snowpipe?

15. How does Snowflake handle semi-structured data (JSON, Avro, Parquet, ORC) with the VARIANT type?

16. What are Snowflake Dynamic Tables and how do they differ from regular tables and views?

17. What is Time Travel in Snowflake and how does it work (retention period, UNDROP, AT/BEFORE)?

18. What is Fail-safe in Snowflake and how does it differ from Time Travel?

19. What is the Snowflake Query Profile and how do you use it to diagnose slow queries?

20. What is result caching in Snowflake and under what conditions does it apply?

21. What is the metadata cache (Cloud Services layer cache) and how does it speed up queries?

22. What is a clustering key and how does it reduce partition pruning cost for large tables?

23. What are Snowflake Materialized Views and when should you use them over regular views?

24. What is the difference between a Snowflake View, Materialized View, and Dynamic Table?

25. How do you optimize query performance in Snowflake (warehouse sizing, clustering, pruning, result cache)?

26. What are Snowflake Streams and how do they implement Change Data Capture (CDC)?

27. What are Snowflake Tasks and how do you schedule SQL transformations with them?

28. How does Snowflake implement Role-Based Access Control (RBAC) and what are the system-defined roles?

29. What is column-level security in Snowflake (Dynamic Data Masking and Column-level Security policies)?

30. What is Row Access Policy in Snowflake and how does it implement row-level security?

31. How does Snowflake encrypt data at rest and in transit?

32. What is Snowflake's Tri-Secret Secure model and when is it used?

33. What are Snowflake Object Tags and Data Classification and how do they support governance?

34. What is the Snowflake Access History feature and how does it support audit and compliance?

35. What is Snowflake Secure Data Sharing and how does it work without copying data?

36. What is the Snowflake Data Marketplace and what types of data products are available?

37. What are Snowflake Data Clean Rooms and what privacy problems do they solve?

38. What is Snowpark and how does it allow Python/Java/Scala code to run inside Snowflake?

39. What are Snowflake Native Apps and how does the Native App Framework work?

40. What are Snowflake External Tables and when would you use them over internal tables?

41. What is Snowflake's multi-cluster warehouse and how does it handle concurrency auto-scaling?

42. What is Resource Monitor in Snowflake and how do you use it to control credit consumption?

43. How does Snowflake support ELT patterns and how does it compare to ETL?

44. What are common Snowflake anti-patterns and performance pitfalls to avoid?

45. How does Snowflake compare to BigQuery and Redshift in architecture and pricing model?

Comments & Discussions

Recently added...