Data Engineer Resume
Bentonville, AR
PROFESSIONAL SUMMARY:
- More than 8 years of IT experience in Software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
- Having 4+ years of experience in Data Analysis using Hadoop Eco System components ( Spark , HDFS , MapReduce , Sqoop , Hive ) in Retail , Financial and Health - Care sector.
- Experience with NoSQL databases like HBase and Cassandra.
- Hands on experience in Sequence files, RC files, Avro, Parquet file formats.
- Experience in running Hive scripts and Unix and Linux shell scripting.
- Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Implemented Sqoop for large dataset transfer between HDFS and RDBMS and vice-versa.
- Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Hands on Experience in designing and developing applications in Spark using Scala and Python.
- Experience in developing Scala scripts to run in Spark cluster.
- Created Partitions, Buckets when creating hive tables and uses various columnar formats like Parquet, ORC for storing the data.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Strong experience working with Spark Dataframes, Spark SQL, Spark ML and Spark Streaming APIs.
- Developed Kafka producers to receive real time streaming feeds into Kafka topics.
- Developed Spark Streaming applications to consume the JSON messages from Kafka topics and write to HBase.
- Extensive knowledge on stream processing platforms like Flume and Kafka.
- Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
- Profound experience in working with Cloudera and Hortonworks Hadoop Distributions on multi-node cluster.
- Qlik Sense Cloud is used to create interactive reports and dashboards with stunning charts and graphs.
- Involved in Agile methodologies, daily scrum meetings, sprint planning.
- Experience in using SQL Server 2012/2014/2016 , MySQL, PostgreSQL, SQLite3 and Oracle.
- Experience in using IDEs like Eclipse, IntelliJ.
- Hands on experience on writing Queries, Stored procedures, Functions and Triggers by using SQL.
- Proficient and Worked with GIT, Jenkins and Maven.
- Enthusiastic and Quick in learning new applications and tools, and willing to take individual responsibilities. A good team player with strong ability to learn and adapt new skills.
- Good analytical, communication, problem solving skills and adore learning new technical, functional skills.
AREAS OF EXPERTISE:
Big Data Ecosystem: HDFS and Map Reduce, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache Kafka, Sqoop, Flume.
Hadoop Distributions: HBase, Cassandra, MongoDB
Programming Languages: SCALA, PYTHON, HiveQL, Ruby on Rails, C, C++, Java.
Scripting Languages: Shell Scripting, Java Scripting
BI Tools: Qlik Sense Cloud, Power BI.
Databases: SQL, Oracle, Teradata, DB2, PostgreSQL, MySQL, SQLite3
Cluster Management: Hortonworks, Cloudera Manager
Operating Systems: Windows, Mac, Unix, Linux
Version Control Tools: SVN, GitHub, Bitbucket, GitLab.
PROFESSIONAL EXPERIENCE:
Data Engineer
Confidential, Bentonville, AR
Responsibilities:
- Developed TDCH scripts for importing and exporting data into HDFS and Hive.
- Used Fair Scheduling to allocate resources in yarn.
- Responsible to manage data coming from different sources.
- Scheduled automated jobs using Cron scheduler.
- Involved in creating Hive Tables, loading with data and writing Hive queries.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Read the ORC files and create Data frames to use in spark.
- Performed data transformations and analytics on large dataset using Spark .
- Experienced working with Spark Core and Spark SQL using Python.
- Experienced working with Spark SQL using Pyspark.
- Performance optimizations on Spark/Python.
- Experience in developing Python scripts to run in Spark cluster.
- Used Pyhton collection framework to store process the complex consumer information.
- Integrated spark jobs with MLP platform.
Environment: Hadoop, HDFS, Hive, Spark, Python, Oozie, Cron, Teradata, Yarn, Unix, Hortonworks, TDCH, Spark SQL.
Hadoop/ Spark Developer
Confidential, Russellville, AR
Responsibilities:
- Extracted the data from RDBMS into HDFS using Sqoop.
- Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
- Created tables in Cassandra to store variable data formats of data coming from different portfolios
- Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Import the data from different sources like HDFS/Hive into Spark RDD.
- Experienced with in working with Spark Core and Spark SQL.
- Experienced working with Spark Core and Spark SQL using Scala.
- Experience in developing Scala scripts to run in Spark cluster.
- Used Scala collection framework to store process the complex consumer information.
- Used Scala for implemented fault tolerant mechanism by handling the various types of error messages and reprocess them without any concurrency issues.
- Worked on different file formats Avro, RC and ORC file formats.
- Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Performed data transformations and analytics on large dataset using Spark.
- Integrated BI tool like Qlik Sense Cloud with Impala and analysed the data.
Environment: Hadoop, HDFS, Sqoop, Hive, Cassandra, Scala, Spark, Kafka, Linux, Qlik Sense Cloud, SQL.
Hadoop Developer
Confidential, Dallas, TX
Responsibilities:
- Implemented real time data pipelines using Kafka and Spark Streaming.
- Configured Flume to transport web server logs into HDFS.
- Developed spark applications to perform data preparation and other analytics on data.
- Worked extensively with Databricks cloud platform over AWS.
- Developed multiple Kafka Producers and Consumers as per the specifications.
- Configured Spark Streaming to receive real time data and store the stream data to S3.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Experienced with Spark Core, Spark-SQL, Data Frame, RDDs and YARN.
- Designed and developed Hive tables to store staging and historical data.
- Created Hive tables as per requirement, internal and external tables are defined with appropriate statics and dynamic partitions, intended for efficiency.
- Experience in using Parquet file format with Snappy compression for optimized storage of. Hive tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Developed Sqoop jobs with incremental load to populate Hive External tables.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, EMR, S3, Redshift.
- Involved in setting up and managing sessions. Currently responsible for mentoring peers and leading technical design.
- Implemented the workflows using Apache Oozie framework to automate tasks.
Environment: Databricks, Hadoop, S3, Hive, Pig, Spark, Scala, Hive, Sqoop, Flume, HBase, YARN, RDBMS, Oozie.
Java/ Hadoop Developer
Confidential, Herndon, VA
Responsibilities:
- Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
- Installed Cloudera Manager on the clusters.
- Used a 15-node cluster on Amazon EC2.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Crawled public posts from Facebook and tweets.
- Used Solr search engine to search multiple sites and return recommendations.
- Used Flume and Kafka to get the streaming data from Twitter and Facebook.
- Used MongoDB to capture streaming data.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features.
- Wrote MapReduce jobs with the Data Science team to analyze this data.
- Converted output to structured data and imported to Informatica with analytics team.
Environment : Hadoop, MongoDB, HDFS, MapReduce, Flume, Java, Informatica, Cloudera Manage, Amazon EC2, Solr.
Java Developer
Confidential
Responsibilities:
- Gathering requirements from end users and create functional requirements.
- Contribute on process flow analysing the functional requirements.
- Development of Graphical user interface for user self-service screen.
- Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level.
- Development of UI models using HTML, JSP, JavaScript, Web Link and CSS.
- Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
- Support in end user, testing and documentation.
- Implemented Backing beans for handling UI components and stores its state in a scope.
- Worked on implementing EJB Stateless sessions for communicating with Controller.
- Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
- Worked on Oracle PL/SQL queries to Select, Update and Delete data.
- Worked on MAVEN for build automation. Used GIT for version control.
Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, SAX, Rational Rose, UML.