Data Engineer Resume Saint Louis, MO - Hire IT People

SUMMARY

7+ years of technical expertise in complete software development life cycle (SDLC), which includes 6 years of Data Engineering experience using Hadoop and Big Data Stack.
Hands on experience working wif Spark and Hadoop ecosystems like MapReduce, Sqoop, Hive, b, Flume, Kafka, Zookeeper and NoSQL Databases like HBase.
Excellent noledge and understanding of Distributed Computing and Parallel processing frameworks.
Strong experience wif developing end - to-end Spark applications in Scala.
Worked extensively on troubleshooting issues related to memory management, resource management, wif in spark applications.
Strong noledge on fine-tuning spark applications and hive scripts.
Written complex MapReduce jobs to perform various data transformations on large scale datasets.
Experience in installation, configuration, and monitoring Hadoop clusters both in house and on the cloud (AWS).
Good experience working wif AWS Cloud services like S3, EMR, Redshift, Atana, Glue Meta store etc.,
Extending Hive core functionality by writing custom UDF’s for Data Analysis.
Handling importing of data from various data source, performed transformation, and hands on developing and debugging MR2 jobs to process large data sets.
Experience in writing queries in HQL (Hive Query Language), to perform data analysis.
Created Hive External and Managed Tables.
Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization.
Experienced in writing Oozie workflows and coordinator jobs to schedule sequential Hadoop jobs.
Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server.
Good experience utilizing Sqoop extensively for ingesting data from relational databases.
Good noledge on Kafka for streaming real time feeds from external rest applications to Kafka topics.
Worked on building real time data workflows using Kafka, Spark Streaming and HBase.
Good understanding of Relational Databases like MySQL, Postgres, Oracle, and Teradata.
Experienced in using GIT, SVN.
Ability to deal wif build tools like Apache Maven, SBT

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark 2.x/1.x, YARN, Kafka 2.10, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari

Cloud Environment: AWS, Google Cloud

Hadoop Distributions: Cloudera CDH 6.1/5.12/5., Hortonworks, MAPR

ETL: Talend

Languages: Python, Shell Scripting, Scala

NoSQL Databases: MongoDB, HBase, DynamoDB

Development / Build Tools: Eclipse, Git, IntelliJ and log4J

RDBMS: Oracle 10g,11i, MS SQL Server, DB2

Testing: MRUnit Testing, Quality Center (QC)

Virtualization: VMWare, AWS/EC2, Google Compute Engine

Build Tools: Maven, Ant, SBT

PROFESSIONAL EXPERIENCE

Confidential, Saint Louis, MO

Data Engineer

Responsibilities:

Developed custom input adaptors for ingesting click stream data from external sources like ftp server into S3 backed data lakes on daily basis.
Created various spark applications using Scala to perform series of enrichments of these clicks stream data combined wif enterprise data of the users.
Implemented batch processing of jobs using Spark Scala API.
Developed Sqoop scripts to import/export data from Teradata to HDFS and into Hive tables.
Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance wif Hive QL queries.
Worked wif multiple file formats like Avro, Parquet, and Orc.
Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
Wrote Kafka producers to stream the data from external rest api’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities.
Worked extensively wif Sqoop for importing data from Teradata.
Implemented business logic in Hive and written UDF’s to process the data for analysis.
Utilized AWS services like S3, EMR, Redshift, Atana, Glue meta store etc., for building and managing data pipelines wifin the cloud.
Automated EMR Cluster creation and termination using AWS Java SDK.
Loaded the processed data to redshift clusters using Spark Redshift Integration.
Created views wif-in Atana for allowing downstream reporting and data analysis team to query and analyze the results.

Environment: AWS Services (S3, EMR, Redshift, Atana, Glue meta store), Spark, Hive, Teradata, Scala, Python.

Confidential, Tampa, FL

Data Engineer

Responsibilities:

Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement
Data pipeline consists of Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
Built real time data pipelines by developing Kafka producers and spark streaming applications for consuming.
Ingested syslog messages parsed them and streamed the data to Kafka.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and tan loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Scheduled and executed workflows in Oozie to run various jobs.

Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java

Confidential, Sterling, VA

Hadoop Engineer

Responsibilities:

Involved in requirement analysis, design, coding, and implementation phases of the project.
Loaded the data from Teradata to MAPR using Teradata Hadoop connectors.
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL Api’s.
Written new spark jobs in Scala to analyze the data of the customers and sales history.
Used Kafka to get data from many streaming sources into HDFS.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Good experience in Hive partitioning, Bucketing and performed different types of joins on Hive tables.
Created Hive external tables to perform ETL on data that is generated on daily basics.
Written HBase bulk load jobs to load processed data to HBase tables by converting to HFiles.
Performed validation on the data ingested to filter and cleanse the data in Hive.
Created Sqoop jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
Loaded the data into hive tables from spark and used ORC columnar format.
Developed oozie workflows to automate and productionize the data pipelines.
Developed Sqoop import Scripts for importing data from Netezza.

Environment: HDP, MapReduce, Spark, Yarn, Hive, Tex, HBase, Oozie, Sqoop, Flume, Teradata, Netezza.

Confidential

Hadoop Developer

Responsibilities:

Worked on migrating MapReduce programs into Spark transformations using Spark and Python.
Developed Spark jobs using python along wif Yarn/MRv2 for interactive and Batch Analysis.
Queried data using Spark SQL wif Spark engine for faster data set processing.
Extensively used Elastic Load Balancing mechanism wif Auto Scaling feature to scale the capacity of EC2 instances across multiple availability zones in a region to distribute incoming high traffic for the application wif zero downtime.
Created Partitioned Hive tables and worked on them using HiveQL.
Involved in creating Hive tables, loading wif data, and writing hive queries which will run internally in map reduce pattern.
Used Data Frames and Datasets APIs for performing analysis on Hive tables.
Monitored Hadoop cluster using Cloudera Manager, interacting wif Cloudera support and log the issues in Cloudera portal and fixed them as per the recommendations.
Responsible for Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along wif Cloudera Manager Upgrade.
Used Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
Worked on continuous integration tools like Jenkins and automated jar files at the end of the day.
Developed Unix shell scripts to load many files into HDFS from Linux File System.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Used Impala connectivity from the User Interface (UI) and query the results using Impala SQL.
Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Managed and scheduled several jobs to run over a certain period on Hadoop cluster using Oozie.
Supported the setting up of QA environment and implemented scripts wif Pig, Hive and Sqoop.
Followed Agile Methodology for entire project and supported testing teams.
Worked wif customers and product manager to prioritize and validate requirements.
Completed plans for long term goals using Microsoft Project.
Coordinated the work efforts of 8-person team for various projects. Helped team complete tasks successfully and on-time and resolved obstacles encountered by team members.
Coordinated and participated in weekly estimation meetings to provide high-level estimates (Story Points) for backlog items.

Environment: Hadoop, HDFS, Hive, MapReduce, Impala, Sqoop, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, AWS, Tableau, Maven, Jenkins, Cloudera, JUnit, agile methodology.

Confidential

Java Developer

Responsibilities:

Reviewed requirements wif the support group and developed an initial prototype.
Involved in the analysis, design and development of the application components using JSP, Servlets components using J2EE design pattern.
Wrote Specification for the development.
Wrote JSPs, Servlets and deployed them on Weblogic Application server.
Implemented Struts framework based on the Model View Controller design paradigm.
Implemented the MVC architecture using Strut MVC.
Struts-Config XML file was created, and Action mappings were done.
Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlet as a Controller
Wrote Oracle PL/SQL Stored procedures, triggers, views for backend database access.
Used JSP’s HTML on front end, Servlets as Front Controllers and Java Script for client-side validations.
Participated in Server side and Client-side programming.
Wrote SQL stored procedures, used JDBC to connect to database.
Designed, developed, and maintained the data layer using JDBC and performed configuration of JAVA Application Framework
Worked on triggers and stored procedures on Oracle database.
Worked on Eclipse IDE to write the code and integrate the application.
Communicated between different applications using JMS.
Extensively worked on PL/SQL, SQL.
Developed different modules using J2EE (Servlets, JSP, JDBC, JNDI).
Tested and validated the application on different testing environments.
Performed functional, integration and validation testing.

Technical Environment: Java, J2EE, Struts, JSP, HTML, Servlets, Java Script, Rational Rose, SQL, PL-SQL, JDBC, MS Excel, UML, Apache Tomcat.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Saint Louis, MO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship