Data Engineer Resume
Sunnyvale, CA
SUMMARY:
- 5 years of experience in Big Data Hadoop, spark, kafka ecosystem and Linux, Mac OS.
- Experience in Big Data / Hadoop Ecosystem technologies which includes Map Reduce, Spark, Cloudera Manager, Cassandra, Zookeeper, Sqoop, Oozie, GCP, Amazon web services, S3.
- In depth knowledge on Spark Architecture, Kafka and its related components such as Driver Program, Cluster manager, Executors, brokers.
- Experience in responsibilities of data engineering . Had excellent experience in data cleaning and data preprocessing
- Excellent knowledge of Hadoop Architecture and its related components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in using spark resource mangers like yarn and mesos.
- Experience in developing applications using Map Reduce for analyzing Big Data with different file formats.
- Experience in working with Docker and managing docker images on clusters.
- Experience in working with Github
- Experience in using linux and Mac OS performing operations with curl and wget.
- Experience in working with Nosql databases.
- Experience with Cloud Service Providers such as Amazon AWS, Google GCP and McQueen.
- In Depth understanding in installing and configuring Google GCP clusters and creating clusters on VM. knowledge of public cloud providers Google Compute Platform and their technology offering, APIs and enterprise integration points
- Managed automated backups and created backup snapshots when needed.
- Worked on different use cases which involved usage of different components of Hadoop.
- Developed and implemented core API services using Scala and Spark.
- Have strong experience in working with spark 1.6, 2.x with Scala, PySpark.
- Experienced in working Scala, python with ide’s like IntelliJ IDEA, PyCharm.
- Strong knowledge of functional programming languages such as Scala and PySpark.
- Experienced in using IDEs like IntelliJ, PyCharm .
- Creating multiple sessions working with RDD, data frames, working with variety of file formats ORC, Parquet, Avro.
- Build complex pipelines using Nifi in getting and pushing data to various data sources .
- Experience in building, maintaining multiple spark clusters (prod, dev. etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
- Good knowledge on Kafka and related Zookeeper to coordinate clusters.
- Having experience in Linux/ Unix shell scripting.
- Experience in analysis, design, development, testing, and implementation of system applications.
- Excellent Analytical, problem solving and communication skills with the ability to work as a part of the team as well as independently.
- Involved in daily SCRUM meetings used SCRUM agile methodologies and Radars.
TECHNICAL SKILLS:
Big data Technologies: Hadoop Ecosystem,MapReduce,Spark, Kafka, Hive,NiFi, Sqoop, Zoo Keeper
Programming/Languages: Scala, Pyspark
Scheduling Tools: Crontab, Chronos
Data Bases: SQL server 2008/12, Oracle, Cassandra
Environment: Windows, Linux/Ubuntu, Mac OS
Framework: Hadoop, Hive, Pig, Spark, Kafka
Cloud Services: Google Cloud Platform, AWS, McQueen { Confidential intenal}
PROFESSIONAL EXPERIENCE:
Confidential, Sunnyvale, CA
Data engineer
- Working experience as a data engineer at Confidential in Confidential iCloud Analytics mail team.
- Experience in working on spark, Kafka in PySpark for data transformation .
- Migrating iCloud mail logs data from oracle data source to McQueen.
- Worked closely with data scientist in providing data to train their models . Providing data for their models by data cleaning and preprocessing .
- Experience in working NoSql databases like Cassandra .
- Experience in building complex ETL data pipelines using Apache Nifi.
- Build pipelines for continues flow of data from different data sources integrating them with databases and cloud service storages using Apache Nifi.
- Created ETL mappings using different look - ups like connected, unconnected and Dynamic look-up with different caches such as persistent cache.
- Worked on complex pipelines, like fetching data from api’s go thorough complex transformations and storing in desired locations .
- Worked on production environment where spark jobs run more than 150hrs and debugging errors at every point and making the job run smoothly.
- Worked on Bare metals, Mesos clusters and scheduling tools like crontab, chronos.
- Experience in working on Kafka for streaming jobs. Used Kafka as a data storage and moving to McQueen ( Confidential internal cloud service)
- Worked with docker and managing docker images in clusters installed on different machines .
- Experience in migrating huge log files from one environment to another with another running PySpark scripts.
- Used Git and Github for code review and version control.
- Extensively used Brew,Pip and cURL.
- Used postman for working with the rest API’s
Environment : Spark, Kafka, Mesos, Baremetal, Crontab, Chronos, PySpark, Mac OS
Confidential, Jacksonville, FL
Hadoop/Spark Developer
- Hands on experience in working with SPARK & SPARK SQL.
- Worked in querying data using Spark SQL on top of Spark engine.
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
- Developed HQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Sqoop to transfer data between DB2 and HDFS .
- Wrote transformations for various business logics in using hiveQL using spark.
- Involved in loading .csv files into hive databases according to the business logics.
- Wrote transformations for about 160 columns tables and upset this into DB2 database using spark.
- Experienced in querying data using Spark SQL on top of Spark engine.
- Created and worked on large data frames with a schema of more than 300 columns
- Created strongly typed datasets.
- Wrote functions whenever required to make column validations, data cleansing as required to achieve logics in Scala
- Created UDF ’ s when required and registering to use throughout application
- Experienced in querying data using Spark SQL on top of Spark engine.
- Working with various file formats like parquet orc, Avro
- Develop quality code adhering to Scala coding Standards and best practices.
- Experienced in performance tuning of Spark jobs for setting right Batch Interval time, correct level of Parallelism and memory tuning, changing the configuration properties and using broadcast variables.
- Involved in hive data cleansing using Eclipse Ide like trimming data, joining columns, performing aggregations on columns like percentages, trimming leading zero’s columns in hive tables etc. performing transformations on top of columns and storing data into hive databases.
- Writing case statements using HiveQL
- Used Spark for Parallel data processing and better performance.
- Experience in developing scalable solutions using NoSQL databases Cassandra.
- Involved in writing transformations for hive tables using spark and upsetting it to DB2.
- Working closely with Cassandra loading activity on history load and incremental loads from Oracle Databases and resolving loading issues and tuning the loader for optimal performance
- Experience in migrating several data bases from on premise data center to Cassandra.
Environments : Hive, Hadoop cluster, Sqoop, Spark 2.1.1, Scala, IntelliJ , Cassandra, Toad DB2
Confidential
Hadoop Developer/Admin
Responsibilities :
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig , H Base database and Sqoop , Zookeeper .
- Evaluated business requirements and prepared detailed specifications that followed project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked as a Hadoop consultant on ( Map Reduce/Pig/HIVE/SQOOP )
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
- Experience in Java Script MVC patterns, Object Oriented JavaScript Design Patterns and AJAX.
- Used Sqoop to transfer data between RDBMS and HDFS
- Worked on Map Reduce Hadoop platform to implement Big Data Solutions using Hive, Map Reduce, shell scripting and java technologies
- Converted existing SQL queries into HiveQL queries.
- Analyzed large amounts of data sets to determined optimal way to aggregate and report on it using Map Reduce programs.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanism
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and extracted the data from HDFS to MySQL using Sqoop .
- Ability to create test automation for big data solution components, including data verification, data quality checks, code coverage analysis and data analysis.
- Worked on Large node clusters and Strong Experience in Multi-node setup of Hadoop cluster. Analyzed Large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for building data solutions in Hadoop using Cascading frameworks.
- Used Pig for data cleansing and extracting the data from the web server output files to load into HDFS.
Environment: Big Data, Hive, Pig, JDBC, Yarn, Hadoop.