
Mar 28, 2024 DSA-C02 Exam Crack Test Engine Dumps Training With 67 Questions
Obtain the DSA-C02 PDF Dumps Get 100% Outcomes Exam Questions For You To Pass
NEW QUESTION # 17
What is the formula for measuring skewness in a dataset?
- A. MODE - MEDIAN
- B. MEAN - MEDIAN
- C. (MEAN - MODE)/ STANDARD DEVIATION
- D. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
Answer: D
Explanation:
Explanation
Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical expla-nation for mathematical proofs, you can refer to books or websites that speak on the same in detail.
NEW QUESTION # 18
To return the contents of a DataFrame as a Pandas DataFrame, Which of the following method can be used in SnowPark API?
- A. REPLACE_TO_PANDAS
- B. CONVERT_TO_PANDAS
- C. TO_PANDAS
- D. SNOWPARK_TO_PANDAS
Answer: C
Explanation:
Explanation
To return the contents of a DataFrame as a Pandas DataFrame, use the to_pandas method.
For example:
1.>>> python_df = session.create_dataframe(["a", "b", "c"])
2.>>> pandas_df = python_df.to_pandas()
NEW QUESTION # 19
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?
- A. Data Virtualization
- B. Data Mining
- C. Data visualization
- D. Data Interpretation
Answer: B
Explanation:
Explanation
Data visualization is the visual depiction of data through the use of graphs, plots, and informational graphics.
Its practitioners use statistics and data science to conveythe meaning behind data in ethical and accurate ways.
NEW QUESTION # 20
Data providers add Snowflake objects (databases, schemas, tables, secure views, etc.) to a share us-ing Which of the following options?
- A. Grant privileges on objects directly to a share.
- B. Grant privileges on objects to a share via a third-party role.
- C. Grant privileges on objects to a share via Account role.
- D. Grant privileges on objects to a share via a database role.
Answer: A,D
Explanation:
ExplanationWhat is a Share?
Shares are named Snowflake objects that encapsulate all of the information required to share a database.
Data providers add Snowflake objects (databases, schemas, tables, secure views, etc.) to a share using either or both of the following options:
Option 1: Grant privileges on objects to a share via a database role.
Option 2: Grant privileges on objects directly to a share.
You choose which accounts can consume data from the share by adding the accounts to the share.
After a database is created (in a consumer account) from a share, all the shared objects are accessible to users in the consumer account.
Shares are secure, configurable, and controlled completely by the provider account:
New objects added to a share become immediately available to all consumers, providing real-time access to shared data.
Access to a share (or any of the objects in a share) can be revoked at any time.
NEW QUESTION # 21
As Data Scientist looking out to use Reader account, Which ones are the correct considerations about Reader Accounts for Third-Party Access?
- A. Data sharing is only possible between Snowflake accounts.
- B. Users in a reader account can query data that has been shared with the reader account, but cannot perform any of the DML tasks that are allowed in a full account, such as data loading, insert, update, and similar data manipulation operations.
- C. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
- D. Each reader account belongs to the provider account that created it.
Answer: A
Explanation:
Explanation
Data sharing is only supported between Snowflake accounts. As a data provider, you might want to share data with a consumer who does not already have a Snowflake account or is not ready to be-come a licensed Snowflake customer.
To facilitate sharing data with these consumers, you can create reader accounts. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
Each reader account belongs to the provider account that created it. As a provider, you use shares to share databases with reader accounts; however, a reader account can only consume data from the provider account that created it.
So, Data Sharing is possible between Snowflake & Non-snowflake accounts via Reader Account.
NEW QUESTION # 22
Skewness of Normal distribution is ___________
- A. Positive
- B. 0
- C. Undefined
- D. Negative
Answer: B
Explanation:
Explanation
Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.
NEW QUESTION # 23
Select the correct mappings:
I. W Weights or Coefficients of independent variables in the Linear regression model --> Model Pa-rameter II. K in the K-Nearest Neighbour algorithm --> Model Hyperparameter III. Learning rate for training a neural network --> Model Hyperparameter IV. Batch Size --> Model Parameter
- A. I,II,III
- B. I,II
- C. II,III,IV
- D. III,IV
Answer: A
Explanation:
Explanation
Hyperparameters in Machine learning are those parameters that are explicitly defined by the user to control the learning process. These hyperparameters are used to improve the learning of the model, and their values are set before starting the learning process of the model.
What are hyperparameters?
In Machine Learning/Deep Learning, a model is represented by its parameters. In contrast, a training process involves selecting the best/optimal hyperparameters that are used by learning algorithms to provide the best result. So, what are these hyperparameters? The answer is, "Hyperparameters are defined as the parameters that are explicitly defined by the user to control the learning process." Here the prefix "hyper" suggests that the parameters are top-level parameters that are used in con-trolling the learning process. The value of the Hyperparameter is selected and set by the machine learning engineer before the learning algorithm begins training the model. Hence, these are external to the model, and their values cannot be changed during the training process.
Some examples of Hyperparameters in Machine Learning
The k in kNN or K-Nearest Neighbour algorithm
Learning rate for training a neural network
Train-test split ratio
Batch Size
Number of Epochs
Branches in Decision Tree
Number of clusters in Clustering Algorithm
Model Parameters:
Model parameters are configuration variables that are internal to the model, and a model learns them on its own. For example, W Weights or Coefficients of independentvariables in the Linear regression model. or Weights or Coefficients of independent variables in SVM, weight, and biases of a neural network, cluster centroid in clustering. Some key points for model parameters are as follows:
They are used by the model for making predictions.
They are learned by the model from the data itself
These are usually not set manually.
These are the part of the model and key to a machine learning Algorithm.
Model Hyperparameters:
Hyperparameters are those parameters that are explicitly defined by the user to control the learning process.
Some key points for model parameters are as follows:
These are usually defined manually by the machine learning engineer.
One cannot know the exact best value for hyperparameters for the given problem. The best value can be determined either by the rule of thumb or by trial and error.
Some examples of Hyperparameters are the learning rate for training a neural network, K in the KNN algorithm.
NEW QUESTION # 24
Which one of the following is not the key component while designing External functions within Snowflake?
- A. API Integration
- B. Remote Service
- C. Proxy Service
- D. UDF Service
Answer: D
Explanation:
Explanation
What is an External Function?
An external function calls code that is executed outside Snowflake.
The remotely executed code is known as a remote service.
Information sent to a remote service is usually relayed through a proxy service.
Snowflake stores security-related external function information in an API integration.
External Function:
An external function is a type of UDF. Unlike other UDFs, an external function does not contain its own code; instead, the external function calls code that is stored and executed outside Snowflake.
Inside Snowflake, the external function is stored as a database object that contains information that Snowflake uses to call the remote service. This stored information includes the URL of the proxy service that relays information to and from the remote service.
Remote Service:
The remotely executed code is known as a remote service.
The remote service must act like a function. For example, it must return a value.
Snowflake supports scalar external functions; the remote service must return exactly one row for each row received.
Proxy Service:
Snowflake does not call a remote service directly. Instead, Snowflake calls a proxy service, which relays the data to the remote service.
The proxy service can increase security by authenticating requests to the remote service.
The proxy service can support subscription-based billing for a remote service. For example, the proxy service can verify that a caller to the remote service is a paid subscriber.
The proxy service also relays the response from the remote service back to Snowflake.
Examples of proxy services include:
Amazon API Gateway.
Microsoft Azure API Management service.
API Integration:
An integration is a Snowflake object that provides an interface between Snowflake and third-party services.
An API integration stores information, such as security information, that is needed to work with a proxy service or remote service.
An API integration is created with the CREATE API INTEGRATION command.
Users can write and call their own remote services, or call remote services written by third parties. These remote services can be written using any HTTP server stack,including cloud serverless compute services such as AWS Lambda.
NEW QUESTION # 25
Which ones are the known limitations of using External function?
- A. Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
- B. Currently, external functions must be scalar functions. A scalar external function re-turns a single value for each input row.
- C. An external function accessed through an AWS API Gateway private endpoint can be accessed only from a Snowflake VPC (Virtual Private Cloud) on AWS and in the same AWS region.
- D. External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly
Answer: A,B,C,D
NEW QUESTION # 26
All aggregate functions except _____ ignore null values in their input collection
- A. Count(attribute)
- B. Sum
- C. Count(*)
- D. Avg
Answer: C
Explanation:
Explanation
Count(*)
* is used to select all values including null.
NEW QUESTION # 27
Which of the following is a useful tool for gaining insights into the relationship between features and predictions?
- A. numpy plots
- B. sklearn plots
- C. Partial dependence plots(PDP)
- D. FULL dependence plots (FDP)
Answer: C
Explanation:
Explanation
Partial dependence plots (PDP) is a useful tool for gaining insights into the relationship between features and predictions. It helps us understand how different values of a particular feature impact model's predictions.
NEW QUESTION # 28
All Snowpark ML modeling and preprocessing classes are in the ________ namespace?
- A. snowflake.ml.modeling
- B. snowpark.ml.modeling
- C. snowflake.sklearn.modeling
- D. snowflake.scikit.modeling
Answer: A
Explanation:
Explanation
All Snowpark ML modeling and preprocessing classes are in the snowflake.ml.modeling namespace. The Snowpark ML modules have the same name as the corresponding module from the sklearn namespace. For example, the Snowpark ML module corresponding to sklearn.calibration is snow-flake.ml.modeling.calibration.
The xgboost and lightgbm modules correspond to snowflake.ml.modeling.xgboost and snow-flake.ml.modeling.lightgbm, respectively.
Not all of the classes from scikit-learn are supported in Snowpark ML.
NEW QUESTION # 29
Which of the following Functions do Support Windowing?
- A. HASH_AGG
- B. LISTAGG
- C. EXTRACT
- D. ENCRYPT
Answer: B
Explanation:
Explanation
What is a Window?
A window is a group of related rows. For example, a window might be defined based on timestamps, with all rows in the same month grouped in the same window. Or a window might be defined based on location, with all rows from a particular city grouped in the same window.
A window can consist of zero, one, or multiple rows. For simplicity, Snowflake documentation usually says that a window contains multiple rows.
What is a Window Function?
A window function is any function that operates over a window of rows.
A window function is generally passed two parameters:
A row. More precisely, a window function is passed 0 or more expressions. In almost all cases, at least one of those expressions references a column in that row. (Most window functions require at least one column or expression, but a few window functions, such as some rank-related functions, do not required an explicit column or expression.) A window of related rows that includes that row. The window can be the entire table, or a subset of the rows in the table.
For non-window functions, all arguments are usually passed explicitly to the function, for example:
MY_FUNCTION(argument1, argument2, ...)
Window functions behave differently; although the current row is passed as an argument the normal way, the window is passed through a separate clause, called an OVER clause. The syntax of the OVER clause is documented later.
LISTAGG
Returns the concatenated input values, separated by the delimiter string.
Window function
1.LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] )
2.[ WITHIN GROUP ( <orderby_clause> ) ]
3.OVER ( [ PARTITION BY <expr2> ] )
HASH_AGG
Returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. HASH_AGG never returns NULL, even if no input is provided. Empty input "hashes" to 0.
Window function
HASH_AGG( [ DISTINCT ] <expr> [ , <expr2> ... ] ) OVER ( [ PARTITION BY <expr3> ] ) HASH_AGG(*) OVER ( [ PARTITION BY <expr3> ] )
NEW QUESTION # 30
Which of the following Snowflake parameter can be used to Automatically Suspend Tasks which are running Data science pipelines after specified Failed Runs?
- A. SUSPEND_TASK
- B. SUSPEND_TASK_AUTO_NUM_FAILURES
- C. There is none as such available.
- D. SUSPEND_TASK_AFTER_NUM_FAILURES
Answer: D
Explanation:
Explanation
Automatically Suspend Tasks After Failed Runs
Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out.
This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a sys-tem error are considered indeterminate and are not included in the count of failed task runs.
Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on a standalone task or the root task in a DAG. When the parameter is set to a value greater than 0, the following behavior applies to runs of the standalone task or DAG:
Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.
The root task is automatically suspended after the run of any single task in a DAG fails or times out the specified number of times in consecutive runs.
The parameter can be set when creating a task (using CREATE TASK) or later (using ALTER TASK). The setting applies to tasks that rely on either Snowflake-managed compute resources (i.e. serverless compute model) or user-managed compute resources (i.e. a virtual warehouse).
The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all standalone or root tasks contained in the modified object. Note that explicitly setting the parameter at a lower (i.e. more granular) level overrides the parameter value set at a higher level.
NEW QUESTION # 31
Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?
- A. CALL TASK
- B. EXECUTE TASK
- C. RUN TASK
- D. RUN ROOT TASK
Answer: B
Explanation:
Explanation
The EXECUTE TASK command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task. A successful run of a roottask triggers a cascading run of child tasks in the DAG as their precedent task completes, as though the root task had run on its defined schedule.
This SQL command is useful for testing new or modified standalone tasks and DAGs before you enable them to execute SQL code in production.
Call this SQL command directly in scripts or in stored procedures. In addition, this command sup-ports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.
NEW QUESTION # 32
Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?
- A. Filters the row labelled r3
- B. Results in Error
- C. Returns the third column
- D. Returns the row name r3
Answer: A
Explanation:
Explanation
It will Filters the row labelled r3.
NEW QUESTION # 33
You previously trained a model using a training dataset. You want to detect any data drift in the new data collected since the model was trained.
What should you do?
- A. Retrained your training dataset after correcting data outliers & no need to introduce new data.
- B. Create a new version of the dataset using only the new data and retrain the model.
- C. Create a new dataset using the new data and a timestamp column and create a data drift monitor that uses the training dataset as a baseline and the new dataset as a target.
- D. Add the new data to the existing dataset and enable Application Insights for the service where the model is deployed.
Answer: C
Explanation:
Explanation
To track changing data trends, create a data drift monitor that uses the training data as a baseline and the new data as a target.
Model drift and decay are concepts that describe the process during which the performance of a model deployed to production degrades on new, unseen data or the underlying assumptions about the data change.
These are important metrics to track once models are deployed toproduction. Models must be regularly re-trained on new data. This is referred to as refitting the model. This can be done either on a periodic basis, or, in an ideal scenario, retraining can be triggered when the performance of the model degrades below a certain pre-defined threshold.
NEW QUESTION # 34
Which of the following metrics are used to evaluate classification models?
- A. All of the above
- B. Area under the ROC curve
- C. Confusion matrix
- D. F1 score
Answer: A
Explanation:
Explanation
Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification and regression. Some metrics, like precision-recall, are useful for multiple tasks. Classification and regression are examples of supervised learning, which constitutes a majority of machine learning applications. Using different metrics for performance evaluation, we should be able to im-prove our model's overall predictive power before we roll it out for production on unseen data. Without doing a proper evaluation of the Machine Learning model by using different evaluation metrics, and only depending on accuracy, can lead to a problemwhen the respective model is deployed on unseen data and may end in poor predictions.
Classification metrics are evaluation measures used to assess the performance of a classification model.
Common metrics include accuracy (proportion of correct predictions), precision (true positives over total predicted positives), recall (true positives over total actual positives), F1 score (har-monic mean of precision and recall), and area under the receiver operating characteristic curve (AUC-ROC).
Confusion Matrix
Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.
It is extremely useful for measuring the Recall, Precision, Accuracy, and AUC-ROC curves.
The four commonly used metrics for evaluating classifier performance are:
1. Accuracy: The proportion of correct predictions out of the total predictions.
2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).
3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).
4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).
These metrics help assess the classifier's effectiveness in correctly classifying instances of different classes.
Understanding how well a machine learning model will perform on unseen data is the main purpose behind working with these evaluation metrics. Metrics like accuracy, precision, recall are good ways to evaluate classification models for balanced datasets, but if the data is imbalanced then other methods like ROC/AUC perform better in evaluating the model performance.
ROC curve isn't just a single number but it's a whole curve that provides nuanced details about the behavior of the classifier. It is also hard to quickly compare many ROC curves to each other.
NEW QUESTION # 35
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10'].
What does the expression g = df.groupby(df.index.str.len()) do?
- A. Groups df based on index values
- B. Groups df based on length of each index value
- C. Data frames cannot be grouped by index values. Hence it results in Error.
- D. Groups df based on index strings
Answer: C
Explanation:
Explanation
Data frames cannot be grouped by index values. Hence it results in Error.
NEW QUESTION # 36
Which type of Machine learning Data Scientist generally used for solving classification and regression problems?
- A. Instructor Learning
- B. Unsupervised
- C. Regression Learning
- D. Reinforcement Learning
- E. Supervised
Answer: E
Explanation:
Explanation
Supervised Learning
Overview:
Supervised learning is a type of machine learning that uses labeled data to train machine learning models. In labeled data, the output is already known. The model just needs to map the inputs to the respective outputs.
Algorithms:
Some of the most popularly used supervised learning algorithms are:
Linear Regression
Logistic Regression
Support Vector Machine
K Nearest Neighbor
Decision Tree
Random Forest
Naive Bayes
Working:
Supervised learning algorithms take labelled inputs and map them to the known outputs, which means you already know the target variable.
Supervised Learning methods need external supervision to train machine learning models. Hence, the name supervised. They need guidance and additional information to return the desired result.
Applications:
Supervised learning algorithms are generally used for solving classification and regression problems.
Few of the top supervised learning applications are weather prediction, sales forecasting, stock price analysis.
NEW QUESTION # 37
What Can Snowflake Data Scientist do in the Snowflake Marketplace as Provider?
- A. Share live datasets securely and in real-time without creating copies of the data or im-posing data integration tasks on the consumer.
- B. Eliminate the costs of building and maintaining APIs and data pipelines to deliver data to customers.
- C. Publish listings for free-to-use datasets to generate interest and new opportunities among the Snowflake customer base.
- D. Publish listings for datasets that can be customized for the consumer.
Answer: A,B,C,D
Explanation:
Explanation
All are correct!
About the Snowflake Marketplace
You can use the Snowflake Marketplace to discover and access third-party data and services, as well as market your own data products across the Snowflake Data Cloud.
As a data provider, you can use listings on the Snowflake Marketplace to share curated data offer-ings with many consumers simultaneously, rather than maintain sharing relationships with each indi-vidual consumer.
With Paid Listings, you can also charge for your data products.
As a consumer, you might use the data provided on the Snowflake Marketplace to explore and ac-cess the following:
Historical data for research, forecasting, and machine learning.
Up-to-date streaming data, such as current weather and traffic conditions.
Specialized identity data for understanding subscribers and audience targets.
New insights from unexpected sources of data.
The Snowflake Marketplace is available globally to all non-VPS Snowflake accounts hosted on Amazon Web Services, Google Cloud Platform, and Microsoft Azure, with the exception of Mi-crosoft Azure Government.
Support for Microsoft Azure Government is planned.
NEW QUESTION # 38
Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. [Select 2]
- A. Query and process data with a DataFrame object.
- B. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
- C. SnowPark currently do not support writing UDTF.
- D. Transform Data using DataIKY tool with SnowPark API.
Answer: A,C
Explanation:
Explanation
Query and process data with a DataFrame object. Refer to Working with DataFrames in Snowpark Python.
Convert custom lambdas and functions to user-defined functions(UDFs) that you can call to process data.
Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
Write a stored procedure that you can call to process data, or automate with a task to build a data pipeline.
NEW QUESTION # 39
......
DSA-C02 Exam Dumps Contains FREE Real Quesions from the Actual Exam: https://www.exam4free.com/DSA-C02-valid-dumps.html
Free Test Engine Verified By SnowPro Advanced Certification Certified Experts: https://drive.google.com/open?id=1OGPfli-g7YbDMo1uKAM_eIHpWsbWwFni
