DATA ENGINEER Resume Columbus, IN - Hire IT People

SUMMARY

Total Experience of 7 years in IT with Data Engineer with 4 years of experience in building data pipelines for ingesting & transforming data.
Hands - on experience in writing complex SQL queries to extract, transform and load (ETL) data from databases.
Good knowledge of Big data applications and implementation of end-to-end streaming solutions using Spark.
Knowledge of design & data modeling for OLTP & OLAP databases with problem solving and analytical skills.
Strong hands-on experience in data cleaning and exploration using various libraries in Python and Scala.
Experience in Data load management, importing & exporting data using SQOOP & FLUME
Experience in scheduling and monitoring jobs using Oozie, Hue and Appworx
Worked on real time data integration using Kafka, Spark streaming and HBase.
Experience in working with Structured Streams in Streaming, Accumulators, Broadcast variables, various levels of caching and optimization techniques in Spark.
Hands on experience of writing code in Scala, building jar in maven and deploying it on databricks cluster.
Developed highly scalable Spark applications using Spark Core, Data frames, Spark-SQL and Spark Streaming API's in Scala.
Worked on Setting Up and Configuring ELK Stack for Error Log capturing and Management
Solid experience in working with csv, text, Avro, parquet, orc, JSON formats of data.
Experience in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the Hive QL queries.
Worked on installing, configuring, and monitoring Apache Airflow for running both batch and streaming workflows.
Strong Experience in writing SQL queries

TECHNICAL SKILLS

Programming: Python, Scala, Java, R, JavaScript, C

Big Data: HDFS, MapReduce, HIVE, Apache Spark, Kafka, Nifi, Airflow, Databricks

Databases: MySQL, SQL/PL-SQL, Microsoft SQL server, Redshift, Cassandra, HBase

BI/Analytics Tools: Tableau, Kibana. Grafana, D3.js, Shiny, Plotly, MS Excel

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, JSON, Shell

ETL Tools: APPWORX, SQOOP, OOZIE, HUE

Office Tools: MS-Office, MS-Project, Visio, Confluence, Jira, Asana

Software Life Cycles: Waterfall and Agile model

Utilities/Tools: Eclipse, Tomcat, JUnit, SVN, Log4j, ANT, Maven, Gitlab, Bitbucket, IntelliJ IDE, Postman

Cloud Platforms: Microsoft Azure, AWS

PROFESSIONAL EXPERIENCE

Confidential, Columbus, IN

DATA ENGINEER

Responsibilities:

Developed pipelines to process data in near real-time
Played a key role in migrating the frameworks’ environment to reflect the latest Databricks runtime version 7.3 LTS
Developed the solution to read and store the data into flattened JSON format to overcome schema drift challenges
Designed and implemented In-House features store (usable functions) which was used to triangulate the engine condition based on Engine Sensor/Servicing Data
Worked on structured streaming to read encrypted messages from Amazon SQS
Migrated from the traditional spark-submit framework on Azure HDInsight to Databricks. All the workloads were moved to DBR 5.5 LTS and later to DBR 7.3.
Upgraded to Delta Lake: Worked on migrating hive tables from parquet to delta format in the Azure data lake Gen2 environment, which brought a significant improvement in the overall query performance for the team.
Implemented Structured Streaming: Implemented an end-to-end structured streaming solution for a product, which replaced an existing batch data pipeline with an almost real-time pipeline from the raw layer to feature layer.
Appworx to Databricks setup: Carried out POC to execute API-based call from Appworx to Databricks.
Challenges related to management model/ server scale up issues/ master-slave network issues were identified and resolved.
Databricks Workspace setup: Worked on setting up an NPIP Databricks workspace for product teams.
Airflow Setup: Installed and configured Apache airflow for workflow management and created workflows in python.

Environment: Apache SPARK, Databricks, Microsoft Azure, Scala, SQL, Python, HIVE

Confidential, Irving, TX

DATA ENGINEER

Responsibilities:

System of Insights Framework: As Part of the S.O.I team, worked on developing, maintaining frameworks for data ingestion and transformation.
Spark ETL Pipelines: Developed ETL pipelines to ingest transactional data, transform it by applying data transformation techniques and move the data using a real time processing pipeline into data warehouse for analysis.
Developed pipelines to process data in near real-time
Worked on spark structured streaming for developing live steaming data pipeline with source as Kafka topics and output as Insights into Cassandra Db. The Data was fed in JSON/XML format and then Stored in Cassandra DB.
Performed data aggregation, queries and writing data back into OLTP system through Sqoop.
Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract.
Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
Setup and Development in Cassandra: Involved with Optimizing Cassandra Namespaces for Low latency and high fault tolerance.
Involved in Developing Insight Store data model for Cassandra, which was utilized to store the transformed data
Development in Hive: Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
ELK STACK DEVELOPMENT: Worked on Setting Up and Configuring ELK Stack for Error Log capturing and Management.

Environment: Apache SPARK, Kafka, Scala, SQL, Python, Hive, Cassandra, HBase, ELK, Grafana, AWS

Confidential

DATA ENGINEER-BIG DATA DEVELOPER

Responsibilities:

Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large - scale system software.
Processing of incoming files using Spark native API.
Usage of Spark Streaming and Spark SQL API to process the files.
Developed Spark scripts by using Scala shell commands as per the requirement.
Processing the schema oriented and non-schema oriented data using Scala and Spark.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Created Hive tables and involved in data loading and writing Hive UDFs.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer API in Scala for consuming data from Kafka topics.
Developing Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
Developed and designed automate process using shell scripting for data movement and purging.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data

Environment: HDFS, Scala, Spark Cloudera Manager, Sqoop, PL/SQL, MySQL, Windows, HBase.

Confidential

SOFTWARE ENGINEER - JAVA DEVELOPER

Responsibilities:

Involved in analysis and design phase of Software Development Life cycle (SDLC).
Used JMS to pass messages as payload to track statuses, milestones and states in the workflows.
Involved in reading & generating pdf documents using ITEXT and also merge the pdfs dynamically.
Involved in the software development life cycle coding, testing, and implementation.
Worked in the health - care domain.
Involved in Using Java Message Service (JMS) for loosely coupled, reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
Developed MDBs using JMS to exchange messages between different applications using MQ Series.
Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
Involved in Content Management using XML.
Developed a standalone module transforming XML 837 module to database using SAX parser.
Installed, Configured and administered WebSphere ESB v6.x
Worked on Performance tuning of WebSphere ESB in different environments on different platforms.
Configured and Implemented web services specifications in collaboration with offshore team.
Involved in Creating dash board charts (business charts) using fusion charts.
Involved in creating reports for the most of the business criteria.
Involved in the configurations set for Web logic servers, DSs, JMS queues and the deployment.
Involved in creating queues, MDB, Worker to accommodate the messaging to track the workflows
Created Hibernate mapping files, sessions, transactions, Query and Criteria s to fetch the data from DB.
Enhanced the design of an application by utilizing SOA.
Generating Unit Test cases with the help of internal tools.
Used JNDI for connection pooling.
Developed ANT scripts to build and deploy projects onto the application server.
Involved in implementation of continuous build tool as Cruise control using Ant
Used Star Team as version controller.

Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Columbus, IN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship