We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Columbus, IN

SUMMARY

  • Total Experience of 7 years in IT with Data Engineer with 4 years of experience in building data pipelines for ingesting & transforming data.
  • Hands - on experience in writing complex SQL queries to extract, transform and load (ETL) data from databases.
  • Good knowledge of Big data applications and implementation of end-to-end streaming solutions using Spark.
  • Knowledge of design & data modeling for OLTP & OLAP databases with problem solving and analytical skills.
  • Strong hands-on experience in data cleaning and exploration using various libraries in Python and Scala.
  • Experience in Data load management, importing & exporting data using SQOOP & FLUME
  • Experience in scheduling and monitoring jobs using Oozie, Hue and Appworx
  • Worked on real time data integration using Kafka, Spark streaming and HBase.
  • Experience in working with Structured Streams in Streaming, Accumulators, Broadcast variables, various levels of caching and optimization techniques in Spark.
  • Hands on experience of writing code in Scala, building jar in maven and deploying it on databricks cluster.
  • Developed highly scalable Spark applications using Spark Core, Data frames, Spark-SQL and Spark Streaming API's in Scala.
  • Worked on Setting Up and Configuring ELK Stack for Error Log capturing and Management
  • Solid experience in working with csv, text, Avro, parquet, orc, JSON formats of data.
  • Experience in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the Hive QL queries.
  • Worked on installing, configuring, and monitoring Apache Airflow for running both batch and streaming workflows.
  • Strong Experience in writing SQL queries

TECHNICAL SKILLS

Programming: Python, Scala, Java, R, JavaScript, C

Big Data: HDFS, MapReduce, HIVE, Apache Spark, Kafka, Nifi, Airflow, Databricks

Databases: MySQL, SQL/PL-SQL, Microsoft SQL server, Redshift, Cassandra, HBase

BI/Analytics Tools: Tableau, Kibana. Grafana, D3.js, Shiny, Plotly, MS Excel

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, JSON, Shell

ETL Tools: APPWORX, SQOOP, OOZIE, HUE

Office Tools: MS-Office, MS-Project, Visio, Confluence, Jira, Asana

Software Life Cycles: Waterfall and Agile model

Utilities/Tools: Eclipse, Tomcat, JUnit, SVN, Log4j, ANT, Maven, Gitlab, Bitbucket, IntelliJ IDE, Postman

Cloud Platforms: Microsoft Azure, AWS

PROFESSIONAL EXPERIENCE

Confidential, Columbus, IN

DATA ENGINEER

Responsibilities:

  • Developed pipelines to process data in near real-time
  • Played a key role in migrating the frameworks’ environment to reflect the latest Databricks runtime version 7.3 LTS
  • Developed the solution to read and store the data into flattened JSON format to overcome schema drift challenges
  • Designed and implemented In-House features store (usable functions) which was used to triangulate the engine condition based on Engine Sensor/Servicing Data
  • Worked on structured streaming to read encrypted messages from Amazon SQS
  • Migrated from the traditional spark-submit framework on Azure HDInsight to Databricks. All the workloads were moved to DBR 5.5 LTS and later to DBR 7.3.
  • Upgraded to Delta Lake: Worked on migrating hive tables from parquet to delta format in the Azure data lake Gen2 environment, which brought a significant improvement in the overall query performance for the team.
  • Implemented Structured Streaming: Implemented an end-to-end structured streaming solution for a product, which replaced an existing batch data pipeline with an almost real-time pipeline from the raw layer to feature layer.
  • Appworx to Databricks setup: Carried out POC to execute API-based call from Appworx to Databricks.
  • Challenges related to management model/ server scale up issues/ master-slave network issues were identified and resolved.
  • Databricks Workspace setup: Worked on setting up an NPIP Databricks workspace for product teams.
  • Airflow Setup: Installed and configured Apache airflow for workflow management and created workflows in python.

Environment: Apache SPARK, Databricks, Microsoft Azure, Scala, SQL, Python, HIVE

Confidential, Irving, TX

DATA ENGINEER

Responsibilities:

  • System of Insights Framework: As Part of the S.O.I team, worked on developing, maintaining frameworks for data ingestion and transformation.
  • Spark ETL Pipelines: Developed ETL pipelines to ingest transactional data, transform it by applying data transformation techniques and move the data using a real time processing pipeline into data warehouse for analysis.
  • Developed pipelines to process data in near real-time
  • Worked on spark structured streaming for developing live steaming data pipeline with source as Kafka topics and output as Insights into Cassandra Db. The Data was fed in JSON/XML format and then Stored in Cassandra DB.
  • Performed data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract.
  • Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
  • Setup and Development in Cassandra: Involved with Optimizing Cassandra Namespaces for Low latency and high fault tolerance.
  • Involved in Developing Insight Store data model for Cassandra, which was utilized to store the transformed data
  • Development in Hive: Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • ELK STACK DEVELOPMENT: Worked on Setting Up and Configuring ELK Stack for Error Log capturing and Management.

Environment: Apache SPARK, Kafka, Scala, SQL, Python, Hive, Cassandra, HBase, ELK, Grafana, AWS

Confidential

DATA ENGINEER-BIG DATA DEVELOPER

Responsibilities:

  • Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large - scale system software.
  • Processing of incoming files using Spark native API.
  • Usage of Spark Streaming and Spark SQL API to process the files.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Processing the schema oriented and non-schema oriented data using Scala and Spark.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics.
  • Developing Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
  • Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
  • Developed and designed automate process using shell scripting for data movement and purging.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data

Environment: HDFS, Scala, Spark Cloudera Manager, Sqoop, PL/SQL, MySQL, Windows, HBase.

Confidential

SOFTWARE ENGINEER - JAVA DEVELOPER

Responsibilities:

  • Involved in analysis and design phase of Software Development Life cycle (SDLC).
  • Used JMS to pass messages as payload to track statuses, milestones and states in the workflows.
  • Involved in reading & generating pdf documents using ITEXT and also merge the pdfs dynamically.
  • Involved in the software development life cycle coding, testing, and implementation.
  • Worked in the health - care domain.
  • Involved in Using Java Message Service (JMS) for loosely coupled, reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
  • Developed MDBs using JMS to exchange messages between different applications using MQ Series.
  • Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
  • Involved in Content Management using XML.
  • Developed a standalone module transforming XML 837 module to database using SAX parser.
  • Installed, Configured and administered WebSphere ESB v6.x
  • Worked on Performance tuning of WebSphere ESB in different environments on different platforms.
  • Configured and Implemented web services specifications in collaboration with offshore team.
  • Involved in Creating dash board charts (business charts) using fusion charts.
  • Involved in creating reports for the most of the business criteria.
  • Involved in the configurations set for Web logic servers, DSs, JMS queues and the deployment.
  • Involved in creating queues, MDB, Worker to accommodate the messaging to track the workflows
  • Created Hibernate mapping files, sessions, transactions, Query and Criteria s to fetch the data from DB.
  • Enhanced the design of an application by utilizing SOA.
  • Generating Unit Test cases with the help of internal tools.
  • Used JNDI for connection pooling.
  • Developed ANT scripts to build and deploy projects onto the application server.
  • Involved in implementation of continuous build tool as Cruise control using Ant
  • Used Star Team as version controller.

Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.

We'd love your feedback!