We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY

  • Experienced in building highly scalable Big - data solutions using Hadoop multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (HBase & Cassandra).
  • Experience in Software development life cycle (SDLC) for various applications including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Hands on experience in writing Spark SQL scripts and implementing Spark RDD transformations and actions using Python/Scala.
  • Have experience in Spark Core, Spark Streaming, Hive Context,Spark SQL for analyzing the data.
  • Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Zookeeper, Sqoop, flume, Kafka, Spark in both Cloudera and Hortonworks environments.
  • Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis.
  • Hands on experience in designing Apache Airflow orchestrations for data ingestion and processing on both on-prem and google cloud platform.
  • Experienced in working with cloud services such as Google Cloud.
  • Good Knowledge on distributed systems, HDFS architecture, internal working details of Map Reduce and Spark processing frameworks.
  • Good understanding of Machine Learning, Data Mining and Algorithms.
  • Good understanding of messaging services like Apache Kafka.
  • Good understanding of cloud-based services such as amazon web services - AWS EC2, S3, RDS, LAMBDA etc.
  • Analyzing Streaming data and identifying important trends in data for further analysis using Spark Streaming.
  • Experience and good Understanding in internal working of streaming service Apache Kafka.
  • End to end experience in designing and data visualizations using Tableau.
  • Participated in detailed object-oriented analysis and design to develop code in accordance to the design.
  • Experienced in using relational databases like MySQL and MS-SQL Server, including writing SQL queries, stored procedure, triggers etc.
  • Familiar with Java virtual machine (JVM) and multi-threaded processing.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Hadoop/Bigdata Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Impala, Apache Spark, Spark Streaming, Spark-SQL, Hue.

Programming Languages: Python, Scala, SQL, HQL

Databases: Oracle, MySQL, HBase

IDE Tools: VS-Code, IntelliJ

Framework: Hibernate, Spring, Struts

Web Technologies: HTML5, CSS3, JavaScript

Reporting Tools /ETL Tools: Tableau, Microsoft Power BI

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Cincinnati, OH

Responsibilities:

  • Involved in various stages of project data flow such as control validation, data quality and change data capture.
  • Experienced in the entire Software development life cycle (SDLC) in the project including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Built our various data stores for specific functionalities of the business. (Transaction, product, store, card, etc.)
  • Building segmentations on the data assets to narrow down specific areas for targeting campaigns.
  • Performed transformations, cleaning and filtering on imported data using Python, Jupyter Notebooks, Visual Studio Code and loaded data into data lake on HDFS.
  • Experienced in developing workflows on Apache Airflow to automate the processes.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Apache Spark.
  • Exported data from RDBMS to HDFS and vice versa using Sqoop.
  • Experienced working and building CI/CD pipelines on TeamCity to facilitate continuous delivery and deployment.
  • Experience working with Jupyter Notebooks on Google Cloud (GCP).
  • Experience building orchestrations with Apache Airflow on GCP.
  • Experienced in working closely with data scientists to cater their changing data requirements.
  • Created partitioned and bucketed tables based on the hierarchy of the dataset.
  • Did various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in hive and Map Side joins.
  • Proficient in reading PL/SQL code and build similar functionalities in python and Spark.
  • Experience in tuning spark applications.
  • Good understanding on Spark SQL, Spark Transformation Engine and Spark Streaming.
  • Experience in using version control services (GitHub).

Environment: Cloudera Manager 5.15, HDFS, Hive, Spark 2.2, Airflow, Python, Jupyter Notebooks, Visual Studio Code, TeamCity, GitHub, Oracle.

Hadoop/Spark Developer

Confidential, Denver,CO

Responsibilities:

  • Involved in various stages of project data flow such as control validation, data quality and change data capture.
  • Performed data mining tasks depending on business scenarios.
  • Experience with Cloudera distribution of Hadoop (CDH 5.10).
  • Experienced in the entire Software development life cycle (SDLC) in the project including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Wrote SQL stored procedure in Hue to access the data from Hive.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Apache Spark.
  • Created Hive tables integrated them as per the design using parquet file format.
  • Handled Delta processing or incremental updates using Hive.
  • Executed Dynamic Partitioning in Hive to segregate customer database based on age
  • Designed and developed Pig Latin scripts and pig command line transformations for data.
  • Involved in writing various joins in MySQL depending on client requirement.
  • Developed Hive scripts for analyst requirements for analysis.
  • Stored data in hive and enabled end users to access through Impala.
  • Exported data from RDBMS to HDFS and vice versa using Sqoop.
  • Created partitioned and bucketed tables in Hive based on the hierarchy of the dataset.
  • Created several UDFs in Pig and Hive to give additional support for the project.
  • Good understanding on Spark SQL, Spark Transformation Engine and Spark Streaming.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
  • Involved in cluster maintenance and monitoring.
  • Have experience in Scala programming language and used it extensively with Apache Spark for data processing.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Environment: Map Reduce, Cloudera Manager 5.10, HDFS, Hive, Spark 1.6, Kafka, Scala, MySQL, Java (JDK 1.6), Eclipse.

Hadoop Developer

Confidential, CA

Responsibilities:

  • Responsible for running Hadoop streaming jobs to process terabytes of xml format data.
  • Created Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Optimized Hive joins for large tables and developed map reduce code for full outer join of two large tables.
  • Designed and developed Pig Latin scripts and pig command line transformations for data joins and custom processing of map reduce outputs.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Created HBase tables for random read/writes by map reduce programs.
  • Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of joins, groups and aggregations.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
  • Implemented Cloudera Manager on existing cluster.
  • Extensively worked with Cloudera Distribution of Hadoop, CDH 5.x.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend.
  • Developed the Talendjobs and make sure to load the data into HIVE tables & HDFS files.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Impala, Cassandra, Kafka, SQL, Python, Spark, Linux, Java.

Java Developer

Confidential

Responsibilities:

  • Implemented the Struts framework with MVC architecture.
  • Developed the presentation layer using JSP, HTML, CSS and client-side validations using JavaScript.
  • Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
  • Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them.
  • Deployed and tested the application using Tomcat web server.
  • Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
  • JUnit was used for unit testing for the integration testing tool.
  • Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations.
  • Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components
  • Creating custom tags for JSP for maximum re-usability of user interface components.
  • Testing and deploying the application on Tomcat.

Environment: Java, JSP, Hibernate, Junit, JavaScript, Servlets, Struts, Hibernate, EJB, JSF, JSP, Ant, Tomcat, CVS, Eclipse, SQLDeveloper, Oracle.

We'd love your feedback!