Data Engineer Resume Cincinnati, OH - Hire IT People

SUMMARY

Experienced in building highly scalable Big - data solutions using Hadoop multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (HBase & Cassandra).
Experience in Software development life cycle (SDLC) for various applications including Analysis, Design, Development, Implementation, Maintenance and Support.
Hands on experience in writing Spark SQL scripts and implementing Spark RDD transformations and actions using Python/Scala.
Have experience in Spark Core, Spark Streaming, Hive Context,Spark SQL for analyzing the data.
Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Zookeeper, Sqoop, flume, Kafka, Spark in both Cloudera and Hortonworks environments.
Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis.
Hands on experience in designing Apache Airflow orchestrations for data ingestion and processing on both on-prem and google cloud platform.
Experienced in working with cloud services such as Google Cloud.
Good Knowledge on distributed systems, HDFS architecture, internal working details of Map Reduce and Spark processing frameworks.
Good understanding of Machine Learning, Data Mining and Algorithms.
Good understanding of messaging services like Apache Kafka.
Good understanding of cloud-based services such as amazon web services - AWS EC2, S3, RDS, LAMBDA etc.
Analyzing Streaming data and identifying important trends in data for further analysis using Spark Streaming.
Experience and good Understanding in internal working of streaming service Apache Kafka.
End to end experience in designing and data visualizations using Tableau.
Participated in detailed object-oriented analysis and design to develop code in accordance to the design.
Experienced in using relational databases like MySQL and MS-SQL Server, including writing SQL queries, stored procedure, triggers etc.
Familiar with Java virtual machine (JVM) and multi-threaded processing.
Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Hadoop/Bigdata Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Impala, Apache Spark, Spark Streaming, Spark-SQL, Hue.

Programming Languages: Python, Scala, SQL, HQL

Databases: Oracle, MySQL, HBase

IDE Tools: VS-Code, IntelliJ

Framework: Hibernate, Spring, Struts

Web Technologies: HTML5, CSS3, JavaScript

Reporting Tools /ETL Tools: Tableau, Microsoft Power BI

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Cincinnati, OH

Responsibilities:

Involved in various stages of project data flow such as control validation, data quality and change data capture.
Experienced in the entire Software development life cycle (SDLC) in the project including Analysis, Design, Development, Implementation, Maintenance and Support.
Built our various data stores for specific functionalities of the business. (Transaction, product, store, card, etc.)
Building segmentations on the data assets to narrow down specific areas for targeting campaigns.
Performed transformations, cleaning and filtering on imported data using Python, Jupyter Notebooks, Visual Studio Code and loaded data into data lake on HDFS.
Experienced in developing workflows on Apache Airflow to automate the processes.
Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Apache Spark.
Exported data from RDBMS to HDFS and vice versa using Sqoop.
Experienced working and building CI/CD pipelines on TeamCity to facilitate continuous delivery and deployment.
Experience working with Jupyter Notebooks on Google Cloud (GCP).
Experience building orchestrations with Apache Airflow on GCP.
Experienced in working closely with data scientists to cater their changing data requirements.
Created partitioned and bucketed tables based on the hierarchy of the dataset.
Did various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in hive and Map Side joins.
Proficient in reading PL/SQL code and build similar functionalities in python and Spark.
Experience in tuning spark applications.
Good understanding on Spark SQL, Spark Transformation Engine and Spark Streaming.
Experience in using version control services (GitHub).

Environment: Cloudera Manager 5.15, HDFS, Hive, Spark 2.2, Airflow, Python, Jupyter Notebooks, Visual Studio Code, TeamCity, GitHub, Oracle.

Hadoop/Spark Developer

Confidential, Denver,CO

Responsibilities:

Involved in various stages of project data flow such as control validation, data quality and change data capture.
Performed data mining tasks depending on business scenarios.
Experience with Cloudera distribution of Hadoop (CDH 5.10).
Experienced in the entire Software development life cycle (SDLC) in the project including Analysis, Design, Development, Implementation, Maintenance and Support.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Wrote SQL stored procedure in Hue to access the data from Hive.
Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Apache Spark.
Created Hive tables integrated them as per the design using parquet file format.
Handled Delta processing or incremental updates using Hive.
Executed Dynamic Partitioning in Hive to segregate customer database based on age
Designed and developed Pig Latin scripts and pig command line transformations for data.
Involved in writing various joins in MySQL depending on client requirement.
Developed Hive scripts for analyst requirements for analysis.
Stored data in hive and enabled end users to access through Impala.
Exported data from RDBMS to HDFS and vice versa using Sqoop.
Created partitioned and bucketed tables in Hive based on the hierarchy of the dataset.
Created several UDFs in Pig and Hive to give additional support for the project.
Good understanding on Spark SQL, Spark Transformation Engine and Spark Streaming.
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
Involved in cluster maintenance and monitoring.
Have experience in Scala programming language and used it extensively with Apache Spark for data processing.
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Environment: Map Reduce, Cloudera Manager 5.10, HDFS, Hive, Spark 1.6, Kafka, Scala, MySQL, Java (JDK 1.6), Eclipse.

Hadoop Developer

Confidential, CA

Responsibilities:

Responsible for running Hadoop streaming jobs to process terabytes of xml format data.
Created Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Optimized Hive joins for large tables and developed map reduce code for full outer join of two large tables.
Designed and developed Pig Latin scripts and pig command line transformations for data joins and custom processing of map reduce outputs.
Developed Spark scripts by using Scala shell commands as per the requirement.
Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
Created HBase tables for random read/writes by map reduce programs.
Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of joins, groups and aggregations.
Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
Implemented Cloudera Manager on existing cluster.
Extensively worked with Cloudera Distribution of Hadoop, CDH 5.x.
Performed Data Ingestion from multiple internal clients using Apache Kafka.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend.
Developed the Talendjobs and make sure to load the data into HIVE tables & HDFS files.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Impala, Cassandra, Kafka, SQL, Python, Spark, Linux, Java.

Java Developer

Confidential

Responsibilities:

Implemented the Struts framework with MVC architecture.
Developed the presentation layer using JSP, HTML, CSS and client-side validations using JavaScript.
Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them.
Deployed and tested the application using Tomcat web server.
Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
JUnit was used for unit testing for the integration testing tool.
Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations.
Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components
Creating custom tags for JSP for maximum re-usability of user interface components.
Testing and deploying the application on Tomcat.

Environment: Java, JSP, Hibernate, Junit, JavaScript, Servlets, Struts, Hibernate, EJB, JSF, JSP, Ant, Tomcat, CVS, Eclipse, SQLDeveloper, Oracle.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Cincinnati, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship