Google cloud platform & Big Data Engineer Resume San Antonio, TX - Hire IT People

SUMMARY

Over 10+ years of extensive hands - on experience in IT industry including deployment of Hadoop Ecosystems and Google cloud computing like MapReduce, Yarn, Sqoop, Flume, Pig, Hive, Big Query, Big Table and 5+ years’ experience on Spark, Storm, Scala, Python.
Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning.
Experience in Cloud computing on Google Cloud Platform with various technology like Dataflow, Pub/Sub, Big Query and all related tools.
Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
Experienced Good understanding of NoSQL databases and hands on work experience inwriting applications No SQL Databases HBase, Cassandra and MongoDB.
Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDD's, Spark YARN.
Experienced in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works, Cloud Storage and Amazon webservices (AWS) and related technologies DynamoDB, EMR, S3, ML.
Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
Deploying templates to environments can be done via NiFi Rest API integrated with other automation tools
Experience in bench marking Hadoop cluster for analysis of queue usage
Experienced in working with Mahout for applying machine learning techniques in the Hadoop Ecosystem.
Good Experience on Amazon Web Services like Redshift, Data Pipeline, ML.
Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
Experience in installation, configuration, support and management of a Hadoop Cluster using Cloudera Distributions.
Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informatica of historical data saved in Hdfs and data analysis using Splunk enterprise edition.
Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub and Bit Bucket.
Experienced in job scheduling and monitoring using Oozie, Zookeeper.

TECHNICAL SKILLS

Big Data Ecosystems: Spark,HDFS and Map Reduce, Pig, Hive, Pig, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache Crunch, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume

Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow,BigQuery

Scripting Languages: Python, shell

Programming Languages: Python, Java

Databases: MongoDB, Netezza, SQL Server, MySQL, ORACLE, DB2

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, San Antonio, TX

Google cloud platform & Big Data Engineer

Responsibilities:

Involved in the process of designing Google Cloud Architecture.
Designed, automated the dataflow pipelines which will ingest data from real time and batch processing.
Configured Kubernetes cluster for deployment and execution of code.
Experience in upgrading the existing Cassandra cluster to latest releases.
Experience in writing dataflow pipelines and transformation in preprocessing layer
Performed Stress and Performance testing, benchmark on the cluster.
Tuned the cluster to achieve maximum throughput and execution time based on the benchmarking results
Migrated the data from one datacenter to another datacenter.
Configured, Documented and Demonstrated inter node communication between Cassandra nodes and client using SSL encryption.

Confidential, Houston Tx

Big Data Engineer / Hadoop developer

Responsibilities:

Used Hive Queries in Spark-SQL for analysis and processing the data
Responsible for handling different data formats like Avro, Parquet and ORC formats
Worked on Import & Export of data using ETL tool Sqoop from MySQL to HDFS using Teradata studio and DBeaver
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
Implemented Optimized Map Joins to get data from different sources to perform cleaning operations before applying the algorithms
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
Involved in Developing a Restful service using Python Flask framework
Used Python modules such as requests, urllib, urllib2 for web crawling
Experienced in managing and reviewing Hadoop log files
Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications.
Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the volume of the data was heavy.
Created and monitored sessions using workflow manager and workflow monitor.
Involved in loading data from UNIX file system to HDFS
Responsible for design & development of Spark SQL Scripts based on Functional Specifications
Design and develop extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Worked on Informatica Source Analyzer, Mapping Designer & Mapplet, and Transformations
Developed end to end ETL batch and streaming data integration into Hadoop (MapR), transforming data
Created highly optimized SQL queries for MapReduce jobs, seamlessly matching the query to the appropriate Hive table configuration to generate efficient report
Worked closely with Quality Assurance, Operations and Production support group to devise the test plans, answer questions and solve any data or processing issues
Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Hive and NoSQL databases
Worked in writing Spark Sql scripts for optimizing the query performance
Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
Implemented Hive UDF's and did performance tuning for better results
Tuned, and developed SQL on HiveQL, Drill and SparkSQL
Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data

Confidential, Bentonville, AR

Senior Software Engineer - Big Data Engineer

Responsibilities:

Participated in Agile Ceremonies and provide status to the team and product owner
Experience in designing and building ETL pipelines to automate the ingestion of structured and unstructured data
Implemented and configured big data technologies as well as tune processes for performance at scale
Proficiency and knowledge of best practices with the Hadoop (YARN, HDFS, MapReduce)
Created Spark jobs to process TBs of data every day for daily analytics
Developed and build frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
Assisted application development teams during application design and development for highly complex and critical data projects
Created data management policies, procedures, and standards
Working with the end-user to make sure the analytics transform data to knowledge in very focused and meaningful ways

Confidential, Philadelphia, PA

Data analyst/ Big Data Engineer

Responsibilities:

Created bash and python scripts for automation of data ingestion
Prepared delivery prerequisites to procure approvals from the management.
Used python lettuce and behave for BDD testing for defect-free delivery
Migrated files from On-prem to AWS S3 to enable data for API consumption
Used Jenkins, Git, and deployed to enable versioning, build pipelines, and deployed into production
Used Best practices in Hadoop to optimize storage and processing - Partitioning, Bucketing, ORC and Parquet files
Monitored jobs using Hue for debugging and resolving issues
Created Impala Scripts to quickly retrieve ad-hoc results for customers
Consumed Kafka streams into Spark from processing batch streams for applying analytics
Used NoSQL Hbase to perform CRUD operations in maintaining customer data
Delivered reports that saved customers $1M in costs. 2. Achieved 97% Customer satisfaction on the work delivered. 3. Optimized hive and

Confidential, McLean, VA

Sr. Hadoop/Spark Developer

Responsibilities:

Involved in deploying systems on Amazon Web Services (AWS) Infrastructure services EC2.
Experience in configuring, deploying the web applications on AWS servers using SBT and Play.
Migrated Map Reduce jobs into Spark RDD transformations using Scala.
Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data.
Performed configuration, deployment and support of cloud services including Amazon Web Services (AWS).
Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.
Responsible for all Public (AWS) and Private (Open stack/VMWare/DCOS/Mesos/Marathon) cloud infrastructure
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and configuring Data Pipelining.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
Involved in Developing a Restful service using Python Flask framework.
Expertised in working with Python GUI frameworks - PyJamas, Python.
Experienced in using Apache Drill data-intensive distributed applications for interactive analysis of large-scale datasets.
Developed end to end ETL batch and streaming data integration into Hadoop (MapR), transforming data.
Used Python modules such as requests, urllib, and urllib2 for web crawling.
Developed tools extensively include Spark, Drill, Hive, HBase, Kafka & MapR Streams, PostgreSQL, Stream Sets

Confidential, Chesterfield, Mi

Hadoop Developer

Responsibilities:

Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, TaskTracker, Name Node, Data Node, YARN and Map Reduce programming .
Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
Developed Cluster coordination services through Zookeeper.
Implemented Hive UDF's and did performance tuning for better results
Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
Implemented Optimized Map Joins to get data from different sources to perform cleaning operations before applying the algorithms.
Created highly optimized SQL queries for MapReduce jobs , seamlessly matching the query to the appropriate Hive table configuration to generate efficient report.
Used other packages such as Beautifulsoup for data parsing in Python.
Tuned, and developed SQL on HiveQL , Drill and SparkSQL.
Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE, HBase.
Implemented CRUD operations on HBase data using thrift API to get real time insights.
Identified data sources for various reports for senior management, wrote complex SQL queries.

We provide IT Staff Augmentation Services!

Google Cloud Platform & Big Data Engineer Resume

San Antonio, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship