Understand TCO, CapEx, and OpEx (refer to the video in the References)
Infra cost: compare fixed upfront (e.g. on-prem) vs. variable cost (e.g. GCP).
Operational cost: human resources for monitoring and incident management is higher for on-prem compared to on GCP since most operations are managed by GCP.
Control access to Billing: for examples,
Finance team has view-only access to Billing
Billing management team has access to create billing account, attach/detach billing account to/from projects.
Configure accountability in identifying how much has been spent by which project and by which team.
Create a repeatable mechanism in reviewing, optimizing your bills through:
Visibility: identify processes to understand your bills.
Activity: optimize the current bills.
Feedback: Automate the whole process, or manually repeat the same process monthly/quarterly.
Analyze your bills
Export bill details to BigQuery for detailed analysis.
Structure your setup on GCP with organization, environments, folders, projects, and applications. Follow this link for best practices with setting up organizations on GCP.
Practical optimization methods
High level: following methods do not require change to your architecture or applications.
Create organizational structure to understand where costs are contributed. This is useful for the use case where you want to chargeback to each costing center.
Cloud SQL committed use discounts give you a 25% discount off of on-demand pricing for a one-year commitment and a 52% discount off of on-demand pricing for a three-year commitment.
Create alerts with notifications to notify the according teams when the bills cross certain thresholds. Notifications can be either built-in (i.e. emails) of extended solutions through Pub/Sub (e.g. slacks, SMS). Alerts can be based on actual or forecasted values.
Set quota for certain products and services to avoid overspending. For example, this link has details on how to set quotas for BigQuery.
Medium level: these are tactical methods to optimize your cost immediately.
Apply labeling to all resources: this can be achieved easier with IaaC (e.g. GKE template files, Deployment Manager yaml, Terraform). Labels allow you to filter, aggregate costs per tag.
Steps:
Identity top 5 cost components.
Provide justifications for changes (e.g. new workloads, new projects, spike in traffic due to new products launched, etc).
Filters: exclude credits, tax to view actual costs.
For Google Compute Engine (GCE) or Google Kubernetes Engine (GKE)
Start/stop resources if not needed.
[Update] Google has made it easier with a built-in feature in the GCE console to allow managing start/stop schedules using a GUI. Please follow this page for details.
This ref has a tutorial to implement one solution. This is very useful method to reduce your running cost from 24h x 7days to 12h x 5days which results in 64% cost saving. This method is particularly useful for resources such as Jump hosts or pre-PROD (e.g. DEV/UAT/QA) projects.
Have pre-PROD projects on other regions (e.g. US) for lower cost with same resources. For example, the same GCE N2 instance is 19% cheaper in Iowa compared to the same instance in Singapore.
Break a single big instance to smaller instances and enable autoscaling. Pay to only what you need (i.e. not what you use).
Leverage Preemtiple VMs if possible. Especially for fault tolerant workloads such as batches, container workloads, or retry-able jobs.
Choose the right storage options such as HDD, SSD, or GCS.
Consider other serverless options such as Cloud Functions or Cloud Run to scale down to 0 (zero) when there is no traffic.
Other miscellaneous: delete unused static IP addresses, unattached disks, and other unused resources.
Database
Start/stop DB instances if not in use. Apply the same method used for GCE as above. This page has instructions on how to implement an automation process to automatically start/stop Cloud SQL instances following schedules.
Use memory optimized instance types instead of standard or compute.
In BigQuery, there are ways to ingest data such as batch load and real-time streaming (i.e. streaming inserts). Use real-time streaming only if it is needed because it incurs cost (vs. batch is free).
Consider BigQuery flat-rate pricing (e.g. Flex Slots vs. on-demand) if the workload is with high-volume of data. Flex slots provides unlimited bytes for a fixed predictable duration and cost. Alternatively, consider monthly and annual flat-rate commitments for long term workloads.
Remove deduplicate data.
Create data retention policy to archive or remove old data.
Optimize queries by providing specific columns instead of select *, filter early (not using LIMIT).
Storage
GCS
Select the right tiers and apply data lifecycle, retention policies.
Pick the right storage type for the right workload. For examples,
Select multi-regional vs. regional options.
Select bucket location to optimize egress cost.
Select standard, nearline, coldline, and archive options.
Logs, snapshots
Balance snapshot ratio depending on regulatory or business requirements.
Archive old data (logs, snapshots) to GCS.
Networking
Avoid unnecessary cross-region data transfer/processing. Use VPC flow logs to identify unwanted cross-region traffic.
Select standard vs. premium network tiers.
Use Cloud CDN to cache static data. This caching method can be applied to other caching solutions such as in-memory cache with Cloud Memorystore (Redis), Apigee cache.
Licenses
Bring your own license (BYOL) such as Microsoft licences.
Run SQL Server on Linux to avoid unnecessary license cost; or migrate Oracle/SQL Server to PostgreSQL/MySQL.
Migrate .NET application to .NET core. Containerize that application or run the application on Cloud Functions.
Hygiene
Delete old projects, VMs, DBs, logs, or any unused resources.
Detailed level: review and upgrade architectures or applications.
Identify mis-used GCP or third-party services.
Adopt containers.
Adopt serverless.
References
This public YouTube video has useful details on cost management at the high-level as well as practical detailed instructions on analyzing your bills through the available tools such as Billing console.
This YouTube playlist has more videos on cost management on GCP.