Data Engineer Resume Bentonville, AR - Hire IT People

PROFESSIONAL SUMMARY:

More than 8 years of IT experience in Software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
Having 4+ years of experience in Data Analysis using Hadoop Eco System components ( Spark , HDFS , MapReduce , Sqoop , Hive ) in Retail , Financial and Health - Care sector.
Experience with NoSQL databases like HBase and Cassandra.
Hands on experience in Sequence files, RC files, Avro, Parquet file formats.
Experience in running Hive scripts and Unix and Linux shell scripting.
Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
Implemented Sqoop for large dataset transfer between HDFS and RDBMS and vice-versa.
Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
Hands on Experience in designing and developing applications in Spark using Scala and Python.
Experience in developing Scala scripts to run in Spark cluster.
Created Partitions, Buckets when creating hive tables and uses various columnar formats like Parquet, ORC for storing the data.
Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
Strong experience working with Spark Dataframes, Spark SQL, Spark ML and Spark Streaming APIs.
Developed Kafka producers to receive real time streaming feeds into Kafka topics.
Developed Spark Streaming applications to consume the JSON messages from Kafka topics and write to HBase.
Extensive knowledge on stream processing platforms like Flume and Kafka.
Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
Profound experience in working with Cloudera and Hortonworks Hadoop Distributions on multi-node cluster.
Qlik Sense Cloud is used to create interactive reports and dashboards with stunning charts and graphs.
Involved in Agile methodologies, daily scrum meetings, sprint planning.
Experience in using SQL Server 2012/2014/2016 , MySQL, PostgreSQL, SQLite3 and Oracle.
Experience in using IDEs like Eclipse, IntelliJ.
Hands on experience on writing Queries, Stored procedures, Functions and Triggers by using SQL.
Proficient and Worked with GIT, Jenkins and Maven.
Enthusiastic and Quick in learning new applications and tools, and willing to take individual responsibilities. A good team player with strong ability to learn and adapt new skills.
Good analytical, communication, problem solving skills and adore learning new technical, functional skills.

AREAS OF EXPERTISE:

Big Data Ecosystem: HDFS and Map Reduce, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache Kafka, Sqoop, Flume.

Hadoop Distributions: HBase, Cassandra, MongoDB

Programming Languages: SCALA, PYTHON, HiveQL, Ruby on Rails, C, C++, Java.

Scripting Languages: Shell Scripting, Java Scripting

BI Tools: Qlik Sense Cloud, Power BI.

Databases: SQL, Oracle, Teradata, DB2, PostgreSQL, MySQL, SQLite3

Cluster Management: Hortonworks, Cloudera Manager

Operating Systems: Windows, Mac, Unix, Linux

Version Control Tools: SVN, GitHub, Bitbucket, GitLab.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, Bentonville, AR

Responsibilities:

Developed TDCH scripts for importing and exporting data into HDFS and Hive.
Used Fair Scheduling to allocate resources in yarn.
Responsible to manage data coming from different sources.
Scheduled automated jobs using Cron scheduler.
Involved in creating Hive Tables, loading with data and writing Hive queries.
Implemented the workflows using Apache Oozie framework to automate tasks.
Read the ORC files and create Data frames to use in spark.
Performed data transformations and analytics on large dataset using Spark .
Experienced working with Spark Core and Spark SQL using Python.
Experienced working with Spark SQL using Pyspark.
Performance optimizations on Spark/Python.
Experience in developing Python scripts to run in Spark cluster.
Used Pyhton collection framework to store process the complex consumer information.
Integrated spark jobs with MLP platform.

Environment: Hadoop, HDFS, Hive, Spark, Python, Oozie, Cron, Teradata, Yarn, Unix, Hortonworks, TDCH, Spark SQL.

Hadoop/ Spark Developer

Confidential, Russellville, AR

Responsibilities:

Extracted the data from RDBMS into HDFS using Sqoop.
Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
Created tables in Cassandra to store variable data formats of data coming from different portfolios
Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
Import the data from different sources like HDFS/Hive into Spark RDD.
Experienced with in working with Spark Core and Spark SQL.
Experienced working with Spark Core and Spark SQL using Scala.
Experience in developing Scala scripts to run in Spark cluster.
Used Scala collection framework to store process the complex consumer information.
Used Scala for implemented fault tolerant mechanism by handling the various types of error messages and reprocess them without any concurrency issues.
Worked on different file formats Avro, RC and ORC file formats.
Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
Implemented the workflows using Apache Oozie framework to automate tasks.
Performed data transformations and analytics on large dataset using Spark.
Integrated BI tool like Qlik Sense Cloud with Impala and analysed the data.

Environment: Hadoop, HDFS, Sqoop, Hive, Cassandra, Scala, Spark, Kafka, Linux, Qlik Sense Cloud, SQL.

Hadoop Developer

Confidential, Dallas, TX

Responsibilities:

Implemented real time data pipelines using Kafka and Spark Streaming.
Configured Flume to transport web server logs into HDFS.
Developed spark applications to perform data preparation and other analytics on data.
Worked extensively with Databricks cloud platform over AWS.
Developed multiple Kafka Producers and Consumers as per the specifications.
Configured Spark Streaming to receive real time data and store the stream data to S3.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
Experienced with Spark Core, Spark-SQL, Data Frame, RDDs and YARN.
Designed and developed Hive tables to store staging and historical data.
Created Hive tables as per requirement, internal and external tables are defined with appropriate statics and dynamic partitions, intended for efficiency.
Experience in using Parquet file format with Snappy compression for optimized storage of. Hive tables.
Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Developed Sqoop jobs with incremental load to populate Hive External tables.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, EMR, S3, Redshift.
Involved in setting up and managing sessions. Currently responsible for mentoring peers and leading technical design.
Implemented the workflows using Apache Oozie framework to automate tasks.

Environment: Databricks, Hadoop, S3, Hive, Pig, Spark, Scala, Hive, Sqoop, Flume, HBase, YARN, RDBMS, Oozie.

Java/ Hadoop Developer

Confidential, Herndon, VA

Responsibilities:

Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
Installed Cloudera Manager on the clusters.
Used a 15-node cluster on Amazon EC2.
Developed ad-clicks based data analytics, for keyword analysis and insights.
Crawled public posts from Facebook and tweets.
Used Solr search engine to search multiple sites and return recommendations.
Used Flume and Kafka to get the streaming data from Twitter and Facebook.
Used MongoDB to capture streaming data.
Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features.
Wrote MapReduce jobs with the Data Science team to analyze this data.
Converted output to structured data and imported to Informatica with analytics team.

Environment : Hadoop, MongoDB, HDFS, MapReduce, Flume, Java, Informatica, Cloudera Manage, Amazon EC2, Solr.

Java Developer

Confidential

Responsibilities:

Gathering requirements from end users and create functional requirements.
Contribute on process flow analysing the functional requirements.
Development of Graphical user interface for user self-service screen.
Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level.
Development of UI models using HTML, JSP, JavaScript, Web Link and CSS.
Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
Support in end user, testing and documentation.
Implemented Backing beans for handling UI components and stores its state in a scope.
Worked on implementing EJB Stateless sessions for communicating with Controller.
Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
Worked on Oracle PL/SQL queries to Select, Update and Delete data.
Worked on MAVEN for build automation. Used GIT for version control.

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, SAX, Rational Rose, UML.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Bentonville, AR

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship