Free Professional-Data-Engineer Exam Files Verified & Correct Answers Downloaded Instantly
Instant Download Professional-Data-Engineer Dumps Q&As Provide PDF&Test Engine
NEW QUESTION 26
What is the general recommendation when designing your row keys for a Cloud Bigtable schema?
- A. Keep your row key reasonably short
- B. Keep the row keep as an 8 bit integer
- C. Keep your row key as long as the field permits
- D. Include multiple time series values within the row key
Answer: A
Explanation:
A general guide is to, keep your row keys reasonably short. Long row keys take up additional memory and storage and increase the time it takes to get responses from the Cloud Bigtable server.
Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys
NEW QUESTION 27
Which of these statements about BigQuery caching is true?
- A. There is no charge for a query that retrieves its results from cache.
- B. By default, a query's results are not cached.
- C. Query results are cached even if you specify a destination table.
- D. BigQuery caches query results for 48 hours.
Answer: A
Explanation:
When query results are retrieved from a cached results table, you are not charged for the query.
BigQuery caches query results for 24 hours, not 48 hours.
Query results are not cached if you specify a destination table.
A query's results are always cached except under certain conditions, such as if you specify a destination table.
Reference: https://cloud.google.com/bigquery/querying-data#query-caching
NEW QUESTION 28
You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non- public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.
How should you securely run this workload?
- A. Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery
- B. Restrict the Google Cloud Storage bucket so only you can see the files
- C. Grant the Project Owner role to a service account, and run the job with it
- D. Use a service account with the ability to read the batch files and to write to BigQuery
Answer: C
NEW QUESTION 29
You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:
* Executing the transformations on a schedule
* Enabling non-developer analysts to modify transformations
* Providing a graphical tool for designing transformations
What should you do?
- A. Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe.
Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery - B. Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
- C. Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
- D. Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
Answer: D
NEW QUESTION 30
You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose?
- A. Use BigQuery machine learning to be able to train the model at scale, so you can analyze the packages in batches.
- B. Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications.
- C. Use TensorFlow to create a model that is trained on your corpus of images. Create a Python notebook in Cloud Datalab that uses this model so you can analyze for damaged packages.
- D. Use the Cloud Vision API to detect for damage, and raise an alert through Cloud Functions. Integrate the package tracking applications with this function.
Answer: B
NEW QUESTION 31
Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)
- A. Segregate data across multiple tables or databases.
- B. Use Google Stackdriver Audit Logging to determine policy violations.
- C. Ensure that the data is encrypted at all times.
- D. Restrict BigQuery API access to approved users.
- E. Restrict access to tables by role.
- F. Disable writes to certain tables.
Answer: B,D,E
NEW QUESTION 32
You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this? (Choose two.)
- A. Use managed export, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
- B. Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.
- C. Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
- D. Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
- E. Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
Answer: B,D
NEW QUESTION 33
Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of data. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)
- A. Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared "events"
- B. Create a new view over events_partitioned using standard SQL
- C. Create a new view over events using standard SQL
- D. Create a service account for the ODBC connection to use for authentication
- E. Create a new partitioned table using a standard SQL query
Answer: B,D
NEW QUESTION 34
Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow.
Numerous data logs are being are being generated during this step, and the team wants to analyze them.
Due to the dynamic nature of the campaign, the data is growing exponentially every hour. The data scientists have written the following code to read the data for a new key features in the logs.
BigQueryIO.Read
.named("ReadLogData")
.from("clouddataflow-readonly:samples.log_data")
You want to improve the performance of this data read. What should you do?
- A. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
- B. Specify the Tableobject in the code.
- C. Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.
- D. Use .fromQuery operation to read specific fields from the table.
Answer: D
Explanation:
BigQueryIO.read.from() directly reads the whole table from BigQuery. This function exports the whole table to temporary files in Google Cloud Storage, where it will later be read from. This requires almost no computation, as it only performs an export job, and later Dataflow reads from GCS (not from BigQuery).
BigQueryIO.read.fromQuery() executes a query and then reads the results received after the query execution. Therefore, this function is more time-consuming, given that it requires that a query is first executed (which will incur in the corresponding economic and computational costs).
NEW QUESTION 35
You are designing a basket abandonment system for an ecommerce company. The system will send a
message to a user based on these rules:
No interaction by the user on the site for 1 hour
Has added more than $30 worth of products to the basket
Has not completed a transaction
You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should
you design the pipeline?
- A. Use a fixed-time window with a duration of 60 minutes.
- B. Use a global window with a time based trigger with a delay of 60 minutes.
- C. Use a session window with a gap time duration of 60 minutes.
- D. Use a sliding time window with a duration of 60 minutes.
Answer: B
NEW QUESTION 36
Which of the following is not possible using primitive roles?
- A. Give GroupA owner access and GroupB editor access for all datasets in a project.
- B. Give UserA owner access and UserB editor access for all datasets in a project.
- C. Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
- D. Give a user access to view all datasets in a project, but not run queries on them.
Answer: D
Explanation:
Primitive roles can be used to give owner, editor, or viewer access to a user or group, but they can't be used to separate data access permissions from job-running permissions.
NEW QUESTION 37
What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?
- A. the selection is final and you must resume using the same storage type
- B. export the data from the existing instance and import the data into a new instance
- C. create a third instance and sync the data from the two storage types via batch jobs
- D. run parallel instances where one is HDD and the other is SDD
Answer: B
Explanation:
Explanation
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can write a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd-
NEW QUESTION 38
Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?
- A. Dataproc Worker
- B. Dataproc Viewer
- C. Dataproc Runner
- D. Dataproc Editor
Answer: A
Explanation:
Explanation
Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).
Reference: https://cloud.google.com/dataproc/docs/concepts/service-accounts#important_notes
NEW QUESTION 39
Which of these is not a supported method of putting data into a partitioned table?
- A. If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
- B. Create a partitioned table and stream new records to it every day.
- C. Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".
- D. Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
Answer: C
Explanation:
You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch. Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables
NEW QUESTION 40
Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing on-premises applications every day. What should they do?
- A. Write a job template in Cloud Dataproc to perform the data transfer.
- B. Use Cloud Dataflow and write the data to Cloud Storage.
- C. Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.
- D. Execute gsutil rsyncfrom the on-premises servers.
Answer: B
NEW QUESTION 41
Why do you need to split a machine learning dataset into training data and test data?
- A. To make sure your model is generalized for more than just the training data
- B. So you can try two different sets of features
- C. So you can use one dataset for a wide model and one for a deep model
- D. To allow you to create unit tests in your code
Answer: A
Explanation:
The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.
Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/
NEW QUESTION 42
You are using BigQuery and Data Studio to design a customer-facing dashboard that displays large quantities of aggregated dat
a. You expect a high volume of concurrent users. You need to optimize tie dashboard to provide quick visualizations with minimal latency. What should you do?
- A. Use BigQuery Bl Engine with authorized views
- B. Use BigQuery BI Engine with materialized views
- C. Use BigQuery Bl Engine with logical reviews
- D. Use BigQuery BI Engine with streaming data.
Answer: D
NEW QUESTION 43
Suppose you have a table that includes a nested column called "city" inside a column called "person", but when you try to submit the following query in BigQuery, it gives you an error. SELECT person FROM
`project1.example.table1` WHERE city = "London" How would you correct the error?
- A. Add ", UNNEST(person)" before the WHERE clause.
- B. Change "person" to "person.city".
- C. Change "person" to "city.person".
- D. Add ", UNNEST(city)" before the WHERE clause.
Answer: A
Explanation:
To access the person.city column, you need to "UNNEST(person)" and JOIN it to table1 using a comma.
Reference:
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy- sql#nested_repeated_results
NEW QUESTION 44
Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?
- A. Preemptible workers cannot use persistent disk.
- B. If a preemptible worker is reclaimed, then a replacement worker must be added manually.
- C. Preemptible workers cannot store data.
- D. A Dataproc cluster cannot have only preemptible workers.
Answer: C,D
Explanation:
The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster:
. Processing only-Since preemptibles can be reclaimed at any time, preemptible workers do not store data. Preemptibles added to a Cloud Dataproc cluster only function as processing nodes.
. No preemptible-only clusters-To ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters.
. Persistent disk size-As a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS.
The managed group automatically re-adds workers lost due to reclamation as capacity permits.
NEW QUESTION 45
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?
- A. Convert the streaming insert code to batch load for individual messages.
- B. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
- C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
- D. Re-write the application to load accumulated data every 2 minutes.
Answer: B
Explanation:
Explanation
The data is first comes to buffer and then written to Storage. If we are running queries in buffer we will face above mentioned issues. If we wait for the bigquery to write the data to storage then we won't face the issue.
So We need to wait till it's written tio storage
NEW QUESTION 46
......
Exam Valid Dumps with Instant Download Free Updates: https://easytest.exams4collection.com/Professional-Data-Engineer-latest-braindumps.html
