C. Use Stackdriver Debugger to review the execution of logic within each application to instrument all applications.
A. Instrument all applications with Stackdriver Profiler.
B. Instrument all applications with Stackdriver Trace and review inter-service HTTP requests.
D. Modify the Node.js application to log HTTP request and response times to dependent applications. Use Stackdriver Logging to find dependent applications that are performing poorly.
C. Click ג€Share chart by URLג€ and provide the URL to the SRE team. Assign the SRE team the Monitoring Viewer IAM role in the workspace project.
A. Share the workspace Project ID with the SRE team. Assign the SRE team the Monitoring Viewer IAM role in the workspace project.
B. Share the workspace Project ID with the SRE team. Assign the SRE team the Dashboard Viewer IAM role in the workspace project.
D. Click ג€Share chart by URLג€ and provide the URL to the SRE team. Assign the SRE team the Dashboard Viewer IAM role in the workspace project.
D. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it on the engineering organization's document portal.
A. Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it with the manager only.
C. Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it with the manager only.
B. Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it on the engineering organization's document portal.
A. Use the default Stackdriver Kubernetes Engine Monitoring agent configuration.
B. Deploy a Fluentd daemonset to GKE. Then create a customized input and output configuration to tail the log file in the application's pods and write to Stackdriver Logging.
C. Install Kubernetes on Google Compute Engine (GCE) and redeploy your applications. Then customize the built-in Stackdriver Logging configuration to tail the log file in the application's pods and write to Stackdriver Logging.
D. Write a script to tail the log file within the pod and write entries to standard output. Run the script as a sidecar container with the application's pod. Configure a shared volume between the containers to allow the script to have read access to /var/log in the application container.
A. Look for the agent's test log entry in the Logs Viewer.
B. Install the most recent version of the Stackdriver agent.
C. Verify the VM service account access scope includes the monitoring.write scope.
D. SSH to the VM and execute the following commands on your VM: ps ax | grep fluentd.
A. Add logic to each Cloud Build step to HTTP POST the build information to a webhook.
B. Add a new step at the end of the pipeline in Cloud Build to HTTP POST the build information to a webhook.
C. Use Stackdriver Logging to create a logs-based metric from the Cloud Build logs. Create an Alert with a Webhook notification type.
D. Create a Cloud Pub/Sub push subscription to the Cloud Build cloud-builds PubSub topic to HTTP POST the build information to a webhook.
A. Compare the canary with a new deployment of the current production version.
B. Compare the canary with a new deployment of the previous production version.
C. Compare the canary with the existing deployment of the current production version.
D. Compare the canary with the average performance of a sliding window of previous production versions.
A. Bucketize the request latencies into ranges, and then compute the percentile at 100 ms.
B. Bucketize the request latencies into ranges, and then compute the median and 90th percentiles.
C. Count the number of home page requests that load in under 100 ms, and then divide by the total number of home page requests.
D. Count the number of home page request that load in under 100 ms, and then divide by the total number of all web application requests.
A. Before merging new code, require 2 different peers to review the code changes.
B. Adopt the blue/green deployment strategy when releasing new code via a CD server.
C. Integrate a code linting tool to validate coding standards before any code is accepted into the repository.
D. Require developers to run automated integration tests on their local development environments before release.
E. Configure a CI server. Add a suite of unit tests to your code and have your CI server run them on commit and verify any changes.
D. ג€¢ Install the gsutil command line tool on your application servers. ג€¢ Write a script using gsutil to upload your application log to a Cloud Storage bucket, and then schedule it to run via cron every 5 minutes. ג€¢ Give the developers the IAM Object Viewer access to view the logs in the specified bucket.
A. ג€¢ Deploy the Stackdriver logging agent to the application servers. ג€¢ Give the developers the IAM Logs Viewer role to access Stackdriver and view logs.
B. ג€¢ Deploy the Stackdriver logging agent to the application servers. ג€¢ Give the developers the IAM Logs Private Logs Viewer role to access Stackdriver and view logs.
C. ג€¢ Deploy the Stackdriver monitoring agent to the application servers. ג€¢ Give the developers the IAM Monitoring Viewer role to access Stackdriver and view metrics.
A. Configure the VPC as a Shared VPC Host project.
B. Configure your network services on the Standard Tier.
C. Configure your Kubernetes cluster as a Private Cluster.
D. Configure a Google Cloud HTTP Load Balancer as Ingress.
B. Develop a post-mortem to be distributed to stakeholders.
A. Call individual stakeholders to explain what happened.
C. Send the Incident State Document to all the stakeholders.
D. Require the engineer responsible to write an apology email to all stakeholders.
A. Verify the maximum node pool size, enable a horizontal pod autoscaler, and then perform a load test to verify your expected resource needs.
B. Because you are deployed on GKE and are using a cluster autoscaler, your GKE cluster will scale automatically, regardless of growth rate.
D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity.
C. Because you are at only 30% utilization, you have significant headroom and you won't need to add any additional capacity for this rate of growth.
A. Use Cloud Build to trigger a Spinnaker pipeline.
C. Use a custom builder in Cloud Build to trigger Jenkins pipeline.
B. Use Cloud Pub/Sub to trigger a Spinnaker pipeline.
D. Use Cloud Pub/Sub to trigger a custom deployment service running in Google Kubernetes Engine (GKE).
C. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 50%
D. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 50%
B. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 33%
A. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 33%
D. Set up the Kubernetes Engine clusters with Binary Authorization.
C. Set up the Kubernetes Engine clusters as private clusters.
B. Enable Vulnerability Analysis on the Container Registry.
A. Enable Cloud Security Scanner on the clusters.
C. Use the Stackdriver Monitoring API to create custom metrics, and then organize your containers using groups.
A. Use Stackdriver Kubernetes Engine Monitoring.
D. Use Stackdriver Logging to export application logs to BigQuery, aggregate logs per container, and then analyze CPU and memory consumption.
B. Use Prometheus to collect and aggregate logs per container, and then analyze the results in Grafana.
A. Create an automated testing script in production to detect failures as soon as they occur.
D. Create a development environment for writing code and a test environment for configurations, experiments, and load testing.
C. Secure the production environment to ensure that developers can't change it and set up one controlled update per year.
B. Create a development environment with smaller server capacity and give access only to developers and testers.
A. flex/connections/current
D. flex/instance/connections/current
C. tcp_ssl_proxy/open_connections
B. tcp_ssl_proxy/new_connections
B. Use Stackdriver Profiler to visualize the resources utilization throughout the application.
C. Determine whether there is an increased number of connections to the Cloud SQL instance.
D. Use Cloud Security Scanner to see whether your Cloud SQL is under a Distributed Denial of Service (DDoS) attack.
A. Check the serial port logs of the Compute Engine instance.
B. Supply the source control tag as a parameter within the image name.
A. Reference the image digest in the source control tag.
D. Use GCR digest versioning to match the image to the tag in source control.
C. Use Cloud Build to include the release version tag in the application image.
A. Look for ways to mitigate user impact and deploy the mitigations to production.
D. Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.
B. Contact the affected service owners and update them on the status of the incident.
C. Establish a communication channel where incident responders and leads can communicate with each other.
D. Create a new GCP monitoring project and create a Stackdriver Workspace inside it. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.
B. Grant relevant team members the Project Viewer IAM role on all GCP production projects. Create Stackdriver workspaces inside each project.
A. Grant relevant team members read access to all GCP production projects. Create Stackdriver workspaces inside each project.
C. Choose an existing GCP production project to host the monitoring workspace. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.
C. 1. Export VM utilization logs from Stackdriver to BigQuery. 2. From BigQuery, export the logs to a CSV file. 3. Import the CSV file into Google Sheets. 4. Build a dashboard in Google Sheets and share it with your stakeholders.
D. 1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket. 2. Enable the Cloud Storage API to pull the logs programmatically. 3. Build a custom data visualization application. 4. Display the pulled logs in a custom dashboard.
B. 1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub. 2. From Cloud Pub/Sub, send the logs to a Security Information and Event Management (SIEM) system. 3. Build the dashboards in the SIEM system and share with your stakeholders.
A. 1. Export VM utilization logs from Stackdriver to BigQuery. 2. Create a dashboard in Data Studio. 3. Share the dashboard with your stakeholders.
A. Purchase Committed Use Discounts.
B. Migrate the instances to a Managed Instance Group.
C. Convert the instances to preemptible virtual machines.
D. Create an Unmanaged Instance Group for the instances used to run the workload.
D. Bring the service into production with no SLOs and build them when you have collected operational data.
A. Adjust the SLO targets to be achievable by the service so you can bring it into production.
B. Notify the development team that they will have to provide production support for the service.
C. Identify recommended reliability improvements to the service to be completed before handover.
A. Roll back the experimental canary release.
D. Trace the origin of 500 errors and the root cause of increased latency.
C. Record data for the postmortem document of the incident.
B. Start monitoring latency, traffic, errors, and saturation.
A. ג€¢ Store your code in a Git-based version control system. ג€¢ Establish a process that allows developers to merge their own changes at the end of each day. ג€¢ Package and upload code to a versioned Cloud Storage basket as the latest master version.
B. ג€¢ Store your code in a Git-based version control system. ג€¢ Establish a process that includes code reviews by peers and unit testing to ensure integrity and functionality before integration of code. ג€¢ Establish a process where the fully integrated code in the repository becomes the latest master version.
C. ג€¢ Store your code as text files in Google Drive in a defined folder structure that organizes the files. ג€¢ At the end of each day, confirm that all changes have been captured in the files within the folder structure. ג€¢ Rename the folder structure with a predefined naming convention that increments the version.
D. ג€¢ Store your code as text files in Google Drive in a defined folder structure that organizes the files. ג€¢ At the end of each day, confirm that all changes have been captured in the files within the folder structure and create a new .zip archive with a predefined naming convention. ג€¢ Upload the .zip archive to a versioned Cloud Storage bucket and accept it as the latest version.
A. A quality SLI: the ratio of non-degraded responses to total responses.
B. An availability SLI: the ratio of healthy microservices to the total number of microservices.
D. A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls.
C. A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes.
C. Metrics exported from the application servers.
D. GKE health checks for your application servers.
E. A synthetic client that periodically sends simulated user requests.
B. Instrumentation coded directly in the client.
A. Your application servers' logs.
A. Publish various metrics from the application directly to the Stackdriver Monitoring API, and then observe these custom metrics in Stackdriver.
B. Install the Cloud Pub/Sub client libraries, push various metrics from the application to various topics, and then observe the aggregated metrics in Stackdriver.
D. Emit all metrics in the form of application-specific log messages, pass these messages from the containers to the Stackdriver logging collector, and then observe metrics in Stackdriver.
C. Install the OpenTelemetry client libraries in the application, configure Stackdriver as the export destination for the metrics, and then observe the application's metrics in Stackdriver.
A. File a bug with the development team so they can find the root cause of the crashing instance.
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the crashed instance promptly after it was crashed.
C. Add a Load Balancer in front of the Compute Engine instance and use health checks to determine the system status.
B. Create a Managed instance Group with a single instance and use health checks to determine the system status.
B. Store secrets in a separate configuration file on Git. Provide select developers with access to the configuration file.
A. Prompt developers for secrets at build time. Instruct developers to not store secrets at rest.
D. Encrypt the secrets and store them in the source code repository. Store a decryption key in a separate repository and grant your pipeline access to it.
C. Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide the CI/CD pipeline with access to Cloud KMS via IAM.
A. Focus on responding to internal stakeholders at least every 30 minutes. Commit to ג€next updateג€ times.
C. Delegate the responding to internal stakeholder emails to another member of the Incident Response Team. Focus on providing responses directly to customers.
D. Provide all internal stakeholder emails to the Incident Commander, and allow them to manage internal communications. Focus on providing responses directly to customers.
B. Provide periodic updates to all stakeholders in a timely manner. Commit to a ג€next updateג€ time in all communications.
A. Assign the Container Developer role to the Cloud Build service account.
C. Create a new service account with the Container Developer role and use it to run Cloud Build.
D. Create a separate step in Cloud Build to retrieve service account credentials and pass these to kubectl.
B. Specify the Container Developer role for Cloud Build in the cloudbuild.yaml file.
B. Configure Stackdriver Profiler to identify and visualize when the cache misses occur based on the logs.
D. Configure BigQuery as a sink for Stackdriver Logging. Create a scheduled query to filter the cache miss logs and write them to a separate table.
C. Create a logs-based metric in Stackdriver Logging and a dashboard for that metric in Stackdriver Monitoring.
A. Link Stackdriver Logging as a source in Google Data Studio. Filter the logs on the cache misses.
D. Deploy the service in one region and use a global load balancer to route traffic to this region.
B. Monitor results of Stackdriver Trace to determine the required amount of resources.
A. Use the n1-highcpu-96 machine type in the configuration of the MIG.
C. Validate that the resource requirements are within the available quota limits of each region.
B. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, and then copy the entries to a Cloud Storage bucket.
C. Create an advanced log filter matching userinfo, configure a log export in the Stackdriver console with Cloud Storage as a sink, and then configure a log exclusion with userinfo as a filter.
D. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, create an advanced log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.
A. Create a basic log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.
A. Disable the CI pipeline and revert to manually building and pushing the artifacts.
D. Run a Git compare between the previous and current Cloud Build Configuration files to find and fix the bug.
B. Change the CI pipeline to push the artifacts is Container Registry instead of Docker Hub.
C. Upload the configuration YAML file to Cloud Storage and use Error Reporting to identify and fix the issue.
B. Ensure that test cases that catch errors of this type are run successfully before new software releases.
C. Follow up with the employees who reviewed the changes and prescribe practices they should follow in the future.
D. Design a policy that will require on-call teams to immediately call engineers and management to discuss a plan of action if an incident occurs.
A. Identify engineers responsible for the incident and escalate to their senior management.
D. Create new synthetic clients to simulate a user journey using the application.
A. Review current application metrics and add new ones as needed.
B. Modify the code to capture additional information for user interaction.
C. Analyze the web proxy logs only and capture response time of each request.
E. Use current and historic Request Logs to trace customer interaction with the application.
C. Create and grant a custom IAM role with the permissions logging.sinks.list and logging.sink.get.
A. Grant the team members the IAM role of logging.configWriter on Cloud IAM.
D. Create an Organizational Policy in Cloud IAM to allow only these members to create log exports.
B. Configure Access Context Manager to allow only these members to export logs.
B. Use a Binary Authorization policy that includes the whitelist name pattern gcr.io/altostrat-images/.
A. Create a custom builder for Cloud Build that will only push images to gcr.io/altostrat-images.
D. Add a tag to each image in gcr.io/altostrat-images and check that this tag is present when the image is deployed.
C. Add logic to the deployment pipeline to check that all manifests contain only images from gcr.io/altostrat-images.
D. Expose the NGINX stats endpoint and configure the horizontal pod autoscaler to use the request metrics exposed by the NGINX deployment.
C. Install the Stackdriver custom metrics adapter and configure a horizontal pod autoscaler to use the number of requests provided by the GCLB.
A. Configure the horizontal pod autoscaler to use the average response time from the Liveness and Readiness probes.
B. Configure the vertical pod autoscaler in GKE and enable the cluster autoscaler to scale the cluster as pods expand.
C. Communications Lead
A. Operations Lead
E. External Customer Communications Lead
B. Engineering Lead
D. Customer Impact Assessor
A. Download and configure a third-party integration between Stackdriver Monitoring and an SMS gateway. Ensure that your team members add their SMS/phone numbers to the external tool.
D. Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS integration to send SMS messages when Slack messages are received. Ensure that your team members add their SMS/phone numbers to the external integration.
C. Ensure that your team members set their SMS/phone numbers in their Stackdriver Profile. Select the SMS notification option for each alerting policy and then select the appropriate SMS/phone numbers from the list.
B. Select the Webhook notifications option for each alerting policy, and configure it to use a third-party integration tool. Ensure that your team members add their SMS/phone numbers to the external tool.
B. ג€¢ In your application, create a metric with a metricKind set to CUMULATIVE and a valueType set to DOUBLE. ג€¢ In Stackdriver's Metrics Explorer, use a Line graph to visualize the metric.
D. ג€¢ In your application, create a metric with a metricKind set to METRIC_KIND_UNSPECIFIED and a valueType set to INT64. ג€¢ In Stackdriver's Metrics Explorer, use a Stacked Area graph to visualize the metric.
A. ג€¢ In your application, create a metric with a metricKind set to DELTA and a valueType set to DOUBLE. ג€¢ In Stackdriver's Metrics Explorer, use a Stacked Bar graph to visualize the metric.
C. ג€¢ In your application, create a metric with a metricKind set to GAUGE and a valueType set to DISTRIBUTION. ג€¢ In Stackdriver's Metrics Explorer, use a Heatmap graph to visualize the metric.
D. Install an Application Performance Monitoring (APM) tool in both locations, and configure an export to a central data storage location for analysis.
B. Import the Stackdriver Debugger package, and configure the application to emit debug messages with timing information.
A. Import the Stackdriver Profiler package, and configure it to relay function timing data to Stackdriver for further analysis.
C. Instrument the code using a timing library, and publish the metrics via a health check endpoint that is scraped by Stackdriver.
B. The organization's public-facing website.
C. A distributed, eventually consistent NoSQL database cluster with sufficient quorum.
D. A GPU-accelerated video rendering platform that retrieves and stores videos in a storage bucket.
A. A scalable in-memory caching system.
A. Configure the build system with protected branches that require pull request approval.
C. Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to only approved users.
B. Use an Admission Controller to verify that incoming requests originate from approved sources.
D. Enable binary authorization inside the Kubernetes cluster and configure the build pipeline as an attestor.
A. Change the specified SLO to match the measured SLI
D. Set up additional service instances in other zones and use them as a failover in case the primary instance is unavailable
B. Move the service to higher-specification compute instances with more memory
C. Set up additional service instances in other zones and load balance the traffic between all instances
C. Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 0.5. Apply changes in testing before production.
B. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 1.0.
A. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 0.5.
D. Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 1.0. Apply changes in testing before production.
B. Store the Terraform code in a network shared folder with child folders for each version release. Ensure that everyone works on different files.
D. Store the Terraform code in a shared Google Drive folder so it syncs automatically to every team member's computer. Organize files with a naming convention that identifies each new version.
A. Store the Terraform code in a version-control system. Establish procedures for pushing new versions and merging with the master.
C. Store the Terraform code in a Cloud Storage bucket using object versioning. Give access to the bucket to every team member so they can download the files.
A. Confirm that the Stackdriver agent has been installed in the hosting virtual machine.
D. Confirm that the application is using the required client library and the service account key has proper permissions.
C. Confirm that port 25 has been opened in the firewall to allow messages through to Stackdriver.
B. Confirm that your account has the proper permissions to use the Stackdriver dashboard.
D. Implement static code analysis tooling against the Docker files used to create the containers.
C. Reconfigure the existing operating system vulnerability software to exist inside the container.
B. Configure the containers in the build pipeline to always update themselves before release.
A. Set up Container Analysis to scan and report Common Vulnerabilities and Exposures.
D. Use larger Cloud Build virtual machines (VMs) by using the machine-type option.
C. Use multiple smaller build steps to minimize execution time.
B. Run multiple Jenkins agents to parallelize the build.
A. Use Cloud Storage to cache intermediate artifacts.
C. Upsize the virtual machines running the login services.
A. Roll back the recent release.
B. Review the Stackdriver monitoring.
D. Deploy a new release to see whether it fixes the problem.
C. Integrate the application with a Single sign-on (SSO) system and do not expose secrets to the application.
B. Inject the secret at the time of instance creation via an encrypted configuration management system.
A. Store the encryption keys in Cloud Key Management Service (KMS) and rotate the keys frequently
D. Leverage a continuous build pipeline that produces multiple versions of the secret for each instance of the application.
C. Distribute the alerts to engineers in different time zones.
D. Redefine the related Service Level Objective so that the error budget is not exhausted.
B. Create an incident report for each of the alerts.
A. Eliminate unactionable alerts.
C. Pre-provision double the compute power used last season, expecting growth.
A. Load teat the application to profile its performance for scaling.
D. Create a runbook on inflating the disaster recovery (DR) environment if there is growth.
B. Enable AutoScaling on the production clusters, in case there is growth.
A. Upgrade the GCS buckets to Multi-Regional.
B. Enable high availability on the CloudSQL instances.
C. Move the application from App Engine to Compute Engine.
D. Modify the App Engine configuration to have additional idle instances.
B. Implement Jenkins on Kubernetes on-premises.
A. Implement Jenkins on local workstations.
D. Implement Jenkins on Compute Engine virtual machines.
C. Implement Jenkins on Google Cloud Functions.
C. Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent storage for seven years.
A. Create a Cloud Storage bucket and develop your application to send logs directly to the bucket.
B. Develop an App Engine application that pulls the logs from Stackdriver and saves them in BigQuery.
D. Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived logs, and then select the bucket as the log export destination.
B. Install the Stackdriver Error Reporting library for Python, and then run your code on Google Kubernetes Engine.
D. Use the Stackdriver Error Reporting API to write errors from your application to ReportedErrorEvent, and then generate log entries with properly formatted error messages in Stackdriver Logging.
C. Install the Stackdriver Error Reporting library for Python, and then run your code on App Engine flexible environment.
A. Install the Stackdriver Error Reporting library for Python, and then run your code on a Compute Engine VM.
A. 90 percentile ג€" 100ms th 95 percentile ג€" 250ms th
B. 90 percentile ג€" 120ms th 95 percentile ג€" 275ms th
D. 90 percentile ג€" 250ms th 95 percentile ג€" 400ms th
C. 90 percentile ג€" 150ms th 95 percentile ג€" 300ms th
A. Develop an appropriate error budget policy in cooperation with all service stakeholders.
C. Negotiate with the development team to reduce the release frequency to no more than once a week.
D. Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is out of SLO.
B. Negotiate with the product team to always prioritize service reliability over releasing new features.
C. Create a Development and a Production GKE cluster in separate projects. In each cluster, create a Kubernetes namespace per team, and then configure Identity Aware Proxy so that each team can only access its own namespace.
B. Create one GCP Project per team. In each project, create a cluster with a Kubernetes namespace for Development and one for Production. Grant the teams IAM access to their respective clusters.
A. Create one GCP Project per team. In each project, create a cluster for Development and one for Production. Grant the teams IAM access to their respective clusters.
D. Create a Development and a Production GKE cluster in separate projects. In each cluster, create a Kubernetes namespace per team, and then configure Kubernetes Role-based access control (RBAC) so that each team can only access its own namespace.
D. Push the images to a private image registry running on a Compute Engine instance in the eu-west-1 region.
B. Push the images to Google Container Registry (GCR) using the us.gcr.io hostname.
C. Push the images to Google Container Registry (GCR) using the eu.gcr.io hostname.
A. Push the images to Google Container Registry (GCR) using the gcr.io hostname.
C. Enrich all instances with metadata specific to the system they run. Configure Stackdriver Logging to export to BigQuery, and query costs based on the metadata.
A. In the Google Cloud Platform Console, use the Cost Breakdown section to visualize the costs per system.
D. Name each virtual machine (VM) after the system it runs. Set up a usage report export to a Cloud Storage bucket. Configure the bucket as a source in BigQuery to query costs based on VM name.
B. Assign all instances a label specific to the system they run. Configure BigQuery billing export and query costs per label.
D. Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and include them in your Cloud Build deployment configuration. Grant Cloud Build access to the KeyRing.
C. Use client-side encryption to encrypt the secrets and store them in a Cloud Storage bucket. Store a decryption key in the bucket and grant Cloud Build access to the bucket.
B. Encrypt the secrets and store them in the application repository. Store a decryption key in a separate repository and grant Cloud Build access to the repository.
A. Create a Cloud Storage bucket and use the built-in encryption at rest. Store the secrets in the bucket and grant Cloud Build access to the bucket.
C. Add an extra node pool that consists of high memory and high CPU machine type instances to the cluster.
A. Reroute the user traffic from the affected region to other regions that don't report issues.
B. Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected region.
D. Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error messages in the logs.
D. Your opinion of the incident's severity compared to past incidents
E. Copies of the design documents for all the services impacted by the incident
B. A list of employees responsible for causing the incident
C. A list of action items to prevent a recurrence of the incident
A. An explanation of the root cause of the incident.
B. Use Node taints with NoExecute.
C. Use a replica set in the deployment specification.
D. Use a stateful set with parallel pod management policy.
A. Use a partitioned rolling update.
D. As the reporting backend PD throughout capacity compared to a known-good threshold
B. As the proportion of report generation requests that result in a successful response
C. As the application's report generation queue size compared to a known-good threshold
A. As the I/O wait times aggregated across all report generation backends
C. Create a Dataflow pipeline to analyze service metrics in real time.
A. Analyze VPC flow logs along the path of the request.
D. Use a distributed tracing framework such as OpenTelemetry or Stackdriver Trace.
B. Investigate the Liveness and Readiness probes for each service.
C. Assign collaborators but no individual owners to the items to keep the postmortem blameless.
D. Assign the team lead as the owner for all action items because they are in charge of the SRE team.
B. Assign multiple owners for each item to guarantee that the team addresses items quickly.
A. Assign one owner for each action item and any necessary collaborators.
A. Introduce the new version of the API. Announce deprecation of the old version of the API. Deprecate the old version of the API. Contact remaining users of the old API. Provide best effort support to users of the old API. Turn down the old version of the API.
C. Announce deprecation of the old version of the API. Contact remaining users on the old API. Introduce the new version of the API. Deprecate the old version of the API. Provide best effort support to users of the old API. Turn down the old version of the API.
D. Introduce the new version of the API. Contact remaining users of the old API. Announce deprecation of the old version of the API. Deprecate the old version of the API. Turn down the old version of the API. Provide best effort support to users of the old API.
B. Announce deprecation of the old version of the API. Introduce the new version of the API. Contact remaining users on the old API. Deprecate the old version of the API. Turn down the old version of the API. Provide best effort support to users of the old API.
B. Use the fluent-plugin-record-reformer Fluentd output plugin to remove the fields from the log entries in flight.
D. Stage log entries to Cloud Storage, and then trigger a Cloud Function to remove the fields and write the entries to Stackdriver via the Stackdriver Logging API.
A. Use the filter-record-transformer Fluentd filter plugin to remove the fields from the log entries in flight.
C. Wait for the application developers to patch the application, and then verify that the log entries are no longer exposing PII.
C. Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.
A. Focus on developing new features rather than avoiding the outages from recurring.
D. Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.
B. Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.
E. Announce planned downtime to consume more error budget, and ensure that users are not depending on a tighter SLO.
D. Implement and measure additional Service Level Indicators (SLIs) fro the application.
A. Add more serving capacity to all of your application's zones.
B. Have more frequent or potentially risky application releases.
C. Tighten the SLO match the application's observed reliability.
C. Shift engineering time to other services that need more reliability.
B. Increase the service's deployment velocity and/or risk.
A. Make the service's SLO more strict.
E. Change the implementation of your Service Level Indicators (SLIs) to increase coverage.
D. Get the product team to prioritize reliability work over new features.
1. B 2. C 3. B 4. B 5. D 6. D 7. A 8. C 9. BE 10. A 11. D 12. B 13. A 14. B 15. B 16. D -- 17. A 18. D 19. A 20. B -- 21. C 22. C 23. D 24. A 25. A 26. C 27. A 28. B 29. A 30.BE 31. A 32. B 33. C 34. B 35. A 36. C 37. C 38. B 39. D 40. B 41. DE 42. A 43. B 44. C 45. AC 46. C 47. C 48. A 49. D 50. D 51. C 52. B 53. A 54. A 55. A 56. A 57. A 58. A 59. A 60. A 61. D 62. D 63. D 64. C 65. C 66. A 67. D 68. C 69. B 70. D 71. A 72. AC 73. A 74. B 75. D 76. A 77. A 78. A 79. B 80. DE 81. BC