We provide IT Staff Augmentation Services!

Google Cloud Platform & Big Data Engineer Resume

San Antonio, TX

SUMMARY

  • Over 10+ years of extensive hands - on experience in IT industry including deployment of Hadoop Ecosystems and Google cloud computing like MapReduce, Yarn, Sqoop, Flume, Pig, Hive, Big Query, Big Table and 5+ years’ experience on Spark, Storm, Scala, Python.
  • Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
  • Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
  • Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning.
  • Experience in Cloud computing on Google Cloud Platform with various technology like Dataflow, Pub/Sub, Big Query and all related tools.
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Experienced Good understanding of NoSQL databases and hands on work experience inwriting applications No SQL Databases HBase, Cassandra and MongoDB.
  • Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDD's, Spark YARN.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works, Cloud Storage and Amazon webservices (AWS) and related technologies DynamoDB, EMR, S3, ML.
  • Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
  • Deploying templates to environments can be done via NiFi Rest API integrated with other automation tools
  • Experience in bench marking Hadoop cluster for analysis of queue usage
  • Experienced in working with Mahout for applying machine learning techniques in the Hadoop Ecosystem.
  • Good Experience on Amazon Web Services like Redshift, Data Pipeline, ML.
  • Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
  • Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
  • Experience in installation, configuration, support and management of a Hadoop Cluster using Cloudera Distributions.
  • Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informatica of historical data saved in Hdfs and data analysis using Splunk enterprise edition.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub and Bit Bucket.
  • Experienced in job scheduling and monitoring using Oozie, Zookeeper.

TECHNICAL SKILLS

Big Data Ecosystems: Spark,HDFS and Map Reduce, Pig, Hive, Pig, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache Crunch, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume

Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow,BigQuery

Scripting Languages: Python, shell

Programming Languages: Python, Java

Databases: MongoDB, Netezza, SQL Server, MySQL, ORACLE, DB2

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, San Antonio, TX

Google cloud platform & Big Data Engineer

Responsibilities:

  • Involved in the process of designing Google Cloud Architecture.
  • Designed, automated the dataflow pipelines which will ingest data from real time and batch processing.
  • Configured Kubernetes cluster for deployment and execution of code.
  • Experience in upgrading the existing Cassandra cluster to latest releases.
  • Experience in writing dataflow pipelines and transformation in preprocessing layer
  • Performed Stress and Performance testing, benchmark on the cluster.
  • Tuned the cluster to achieve maximum throughput and execution time based on the benchmarking results
  • Migrated the data from one datacenter to another datacenter.
  • Configured, Documented and Demonstrated inter node communication between Cassandra nodes and client using SSL encryption.

Confidential, Houston Tx

Big Data Engineer / Hadoop developer

Responsibilities:

  • Used Hive Queries in Spark-SQL for analysis and processing the data
  • Responsible for handling different data formats like Avro, Parquet and ORC formats
  • Worked on Import & Export of data using ETL tool Sqoop from MySQL to HDFS using Teradata studio and DBeaver
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
  • Implemented Optimized Map Joins to get data from different sources to perform cleaning operations before applying the algorithms
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
  • Involved in Developing a Restful service using Python Flask framework
  • Used Python modules such as requests, urllib, urllib2 for web crawling
  • Experienced in managing and reviewing Hadoop log files
  • Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications.
  • Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the volume of the data was heavy.
  • Created and monitored sessions using workflow manager and workflow monitor.
  • Involved in loading data from UNIX file system to HDFS
  • Responsible for design & development of Spark SQL Scripts based on Functional Specifications
  • Design and develop extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Worked on Informatica Source Analyzer, Mapping Designer & Mapplet, and Transformations
  • Developed end to end ETL batch and streaming data integration into Hadoop (MapR), transforming data
  • Created highly optimized SQL queries for MapReduce jobs, seamlessly matching the query to the appropriate Hive table configuration to generate efficient report
  • Worked closely with Quality Assurance, Operations and Production support group to devise the test plans, answer questions and solve any data or processing issues
  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Hive and NoSQL databases
  • Worked in writing Spark Sql scripts for optimizing the query performance
  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
  • Implemented Hive UDF's and did performance tuning for better results
  • Tuned, and developed SQL on HiveQL, Drill and SparkSQL
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
  • Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data

Confidential, Bentonville, AR

Senior Software Engineer - Big Data Engineer

Responsibilities:

  • Participated in Agile Ceremonies and provide status to the team and product owner
  • Experience in designing and building ETL pipelines to automate the ingestion of structured and unstructured data
  • Implemented and configured big data technologies as well as tune processes for performance at scale
  • Proficiency and knowledge of best practices with the Hadoop (YARN, HDFS, MapReduce)
  • Created Spark jobs to process TBs of data every day for daily analytics
  • Developed and build frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
  • Assisted application development teams during application design and development for highly complex and critical data projects
  • Created data management policies, procedures, and standards
  • Working with the end-user to make sure the analytics transform data to knowledge in very focused and meaningful ways

Confidential, Philadelphia, PA

Data analyst/ Big Data Engineer

Responsibilities:

  • Created bash and python scripts for automation of data ingestion
  • Prepared delivery prerequisites to procure approvals from the management.
  • Used python lettuce and behave for BDD testing for defect-free delivery
  • Migrated files from On-prem to AWS S3 to enable data for API consumption
  • Used Jenkins, Git, and deployed to enable versioning, build pipelines, and deployed into production
  • Used Best practices in Hadoop to optimize storage and processing - Partitioning, Bucketing, ORC and Parquet files
  • Monitored jobs using Hue for debugging and resolving issues
  • Created Impala Scripts to quickly retrieve ad-hoc results for customers
  • Consumed Kafka streams into Spark from processing batch streams for applying analytics
  • Used NoSQL Hbase to perform CRUD operations in maintaining customer data
  • Delivered reports that saved customers $1M in costs. 2. Achieved 97% Customer satisfaction on the work delivered. 3. Optimized hive and

Confidential, McLean, VA

Sr. Hadoop/Spark Developer

Responsibilities:

  • Involved in deploying systems on Amazon Web Services (AWS) Infrastructure services EC2.
  • Experience in configuring, deploying the web applications on AWS servers using SBT and Play.
  • Migrated Map Reduce jobs into Spark RDD transformations using Scala.
  • Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data.
  • Performed configuration, deployment and support of cloud services including Amazon Web Services (AWS).
  • Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.
  • Responsible for all Public (AWS) and Private (Open stack/VMWare/DCOS/Mesos/Marathon) cloud infrastructure
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and configuring Data Pipelining.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Involved in Developing a Restful service using Python Flask framework.
  • Expertised in working with Python GUI frameworks - PyJamas, Python.
  • Experienced in using Apache Drill data-intensive distributed applications for interactive analysis of large-scale datasets.
  • Developed end to end ETL batch and streaming data integration into Hadoop (MapR), transforming data.
  • Used Python modules such as requests, urllib, and urllib2 for web crawling.
  • Developed tools extensively include Spark, Drill, Hive, HBase, Kafka & MapR Streams, PostgreSQL, Stream Sets

Confidential, Chesterfield, Mi

Hadoop Developer

Responsibilities:

  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, TaskTracker, Name Node, Data Node, YARN and Map Reduce programming .
  • Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
  • Developed Cluster coordination services through Zookeeper.
  • Implemented Hive UDF's and did performance tuning for better results
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
  • Implemented Optimized Map Joins to get data from different sources to perform cleaning operations before applying the algorithms.
  • Created highly optimized SQL queries for MapReduce jobs , seamlessly matching the query to the appropriate Hive table configuration to generate efficient report.
  • Used other packages such as Beautifulsoup for data parsing in Python.
  • Tuned, and developed SQL on HiveQL , Drill and SparkSQL.
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE, HBase.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Identified data sources for various reports for senior management, wrote complex SQL queries.

Hire Now