Data Engineer (Spark and Scala)/Hadoop Admin Resume Raleigh, NC - Hire IT People

SUMMARY

8+ years of IT Experience in Architecture, Analysis, design, development, implementation, maintenance and support, with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirements.
Experience in Test and Production Environments on various business domains like Financial, Insurance and Banking
Experience in writing distributed Scala code for efficient big data processing
3 years of experience on BIG DATA using HADOOP and SPARK framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
Experience in data analysis using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java.
Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
Excellent understanding/knowledge on Hadoop and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
Experience in working with Flume to load the log data from multiple sources directly in to HDFS.
Experience in working with Message Broker services like Kafka and Amazon SQS.
Experience in Real - time data analysis using Spark Streaming and Storm.
Worked with different file formats like flat files, Sequence, Avro and Parquet.
Experience with compressing data with different algorithms like gzip, bzip.
Well versed with Schema design and Performance tuning.
Excellent understanding and knowledge of NOSQL databases like HBase.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
Built real-time Big Data solutions using HBASE handling billions of records.
Good experience working with MapR and Cloudera Distribution.
Experience in designing both time driven and data driven automated workflows using Oozie.
Experience working with Apache SOLR.
Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
Experience working with JAVA, J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets, MS SQL Server.
Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
Experience in writing UNIX shell scripts and Python scripts.
Experience in all stages of SDLC (Agile, Waterfall)
Experience writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
Good team player, strong interpersonal and communication skills combined with self-motivation, initiative and the ability to think outside the box.

TECHNICAL SKILLS

Hadoop Ecosystem: MapReduce, HDFS, Hive, Pig, Sqoop Zookeeper, Oozie, Flume, HBase, Spark, Kafka

Language: C, C++, Java, J2EE, Python, Scala, UML

Web Technologies: JavaScript, JSP, Servlets, JDBC, Unix/Linux Shell Scripting, Python, HTML, XML

Methodologies: Waterfall, Agile/Scrum.

Databases: Oracle, MySQL, HBase

Application/Web server: Apache Tomcat, WebSphere and JBoss.

IDE’s: Eclipse, Netbeans

ETL & Reporting Tools: Informatica, SAP Business Objects, Tableau

Cloud Infrastructures: Amazon Web Services.

PROFESSIONAL EXPERIENCE

Confidential, Raleigh, NC

Data Engineer (Spark and Scala)/Hadoop Admin

Responsibilities:

Developed Spark SQL Scripts for data Ingestion from Oracle to Spark Clusters and relevant data joins using Spark SQL.
Experience building distributed high-performance systems using Spark and Scala
Experience developing Scala applications for loading/streaming data into NoSQL databases (MongoDB) and HDFS.
Designed Distributed algorithms for identifying trends in data and processing them effectively.
Used Spark and Scala for developing machine learning algorithms which analyses click stream data.
Experience in developing machine learning code using spark MLLIB
Used Spark SQL for data pre-processing, cleaning and joining very large data sets.
Experience in creating data lake using spark which is used for downstream applications
Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.
Installed and configured multi-nodes on fully distributed Hadoop cluster.
Involved in Hadoop Cluster environment administration that includes De-commissioning and commissioning nodes, cluster capacity planning, balancing, performance tuning, cluster Monitoring and Troubleshooting.
Configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
Developed the Cronjob for storing the Name-node metadata onto the NFS mount directory.
Worked on installing Hadoop Ecosystem components such as Sqoop, Pig, Hive, Oozie, and Hcatalog.
Involved in HDFS maintenance and administering it through Hadoop-Java API.
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Proficient in writing Flume and Hive scripts to extract, transform and load the data into Database.
Responsible for maintaining, managing and upgradation of Hadoop cluster connectivity and security.
Worked on Machine Learning Algorithms Development for analyzing click stream data using Spark and Scala.
Database migrations from Traditional Data Warehouses to Spark Clusters.
Data Workflows and Pipelines are created for transition and analyzing trends using Spark Mllib.
Entire Project is set up on Amazon Web Services Cloud and all the Algorithms are tuned for their best performances for better performance.
Analyzing Streaming data and identifying important trends in data for further analysis using Spark Streaming and Storm.
Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis.
Experience configuring spouts and bolts in various Apache Storm topologies and validating data in the bolts
Used Spark Streaming to collect data from Kafka in near-real-time and perform necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Batch loading of data to NOSQL storage like MongoDB
Implemented Spark RDD transformations, actions to migrate Map reduce algorithms
Used Git to check-in and checkout code changes.
Used Jira for bug tracking

Environment:, Scala, Apache Spark, AWS, Spark Mllib, Spark SQL, PostgreSQL, Hive, Mongo DB, Apache Storm, Kafka, Git, Jira

Confidential, Des Moines, IA

Hadoop Developer/Hadoop Admin

Responsibilities:

Written the Apache PIG scripts to process the HDFS data.
Created Hive tables to store the processed results in a tabular format.
Developed the sqoop scripts in order to make the interaction between Pig and MySQL Database.
Involved in gathering the requirements, designing, development and testing
Writing the script files for processing data and loading to HDFS
Storing and retrieved data using HQL in Hive.
Developed the UNIX shell scripts for creating the reports from Hive data.
Data Ingestion using Kafka, Data pipeline architecture, Data cleansing, ETL, Processing and some visualization experience. Enable CDH to consume data from customer’s enterprise tool (I have worked with sources like RabbitMQ, IBM MQ, RDBMS, etc)
Use-case development (Hive, Pig, Spark, Spark Streaming); Implemented MapReduce to discover interesting patterns in data.
Installed and configured Hadoop cluster in Development, Testing and Production environments.
Performed both major and minor upgrades to the existing CDH cluster.
Responsible for monitoring and supporting Development activities.
Responsible for administering applications and their maintenance on daily basis. Prepared System Design document with all functional implementations.
Installation of various Hadoop Ecosystems and Hadoop Daemons.
Installation and configuration of Sqoop and Flume.
Involved in Data model sessions to develop models for HIVE tables.
Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SQOOP and Pig Latin.
Developed Java Map Reduce Programs that includes use of custom data types, Input format, record reader etc.
Involved in writing Flume and Hive scripts to extract, transform and load the data into Database
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Converting ETL logic to Hadoop mappings.
Extensive hands on experience in Hadoop file system commands for file handling operations.
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
Used Sentry to control access to databases/data-sets.
Worked on security of the Hadoop cluster and also tuning the cluster to meet necessary performance standards.
Configured backups and performed Name node recoveries from previous backups
Experienced in managing and analyzing Hadoop log files.
Providing documentation on the architecture, deployment and all details the customer would require to run the CDH cluster as part of a “delivery document(s)”
RDBMS: MySQL and Postgresql (some experience to support it as a backend for Hive Metastore, Cloudera Manager related components, Oozie etc.)
Provide Subject Matter Expertise on Linux (To support running CDH/Hadoop optimally on the underlying OS).
Training customers/partners when required.
Understanding customer requirements and identifying how the Hadoop eco-system could be leveraged to implement their requirements into Hadoop, how CDH can fit into their current infrastructure, where Hadoop can complement existing products, etc.

Environment: Cloudera Hadoop Framework, MapReduce, Hive, Pig, HBase, Business Objects, Platfora, HParser, Java, Python, UNIX Shell Scripting.

Confidential, Omaha, Nebraska

Hadoop Developer

Responsibilities:

Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
Implemented helper classes that access HBase directly from Java using Java API.
Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
Experienced in converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase.
Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
Participated with admins in installation and configuring Map Reduce, Hive and HDFS.
Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring.
Used Hive to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
Managed and reviewed Hadoop log files.
Involved in review of functional and non-functional requirements.

Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Oozie, Java, Python, UNIX Shell Scripting, Spark.

Confidential, Des Moines, IA

Hadoop Developer

Responsibilities:

Analyzed large data sets by running Hive queries and Pig scripts
Worked with the Data Science team to gather requirements for various data mining projects
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Simple to complex MapReduce Jobs using Hive and Pig
Involved in running Hadoop jobs for processing millions of records of text data
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Involved in loading data from LINUX file system to HDFS
Responsible for managing data from multiple sources
Responsible to manage data coming from different sources
Assisted in exporting analyzed data to relational databases (mysql) using Sqoop
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Generating tableau reports and building dashboards
Worked closely with business units to define development estimates according to Agile Methodology

Environment: JMS, Sonic Management, Apache Hadoop, Hbase, Hive, Oozie, Crunch, Map/Reduce, Pig, Hive, Java, SQL.

Confidential

Java/J2EE Developer

Responsibilities:

Developed Servlets and JSP based on MVC pattern using Struts Action framework.
Parsing high-level design spec to simple ETL coding and mapping standards
Involved in writing Hibernate queries and Hibernate specific configuration and mapping files.
Used Log4J logging framework to write Log messages with various levels.
Involved in fixing bugs and minor enhancements for the front-end modules.
Used JUnit framework for writing Test Classes.
Coded various classes for Business Logic Implementation.
Preparing and executing Unit test cases
Doing functional and technical reviews
Assuring quality in the deliverables.
Conducted Design reviews and Technical reviews with other project stakeholders.
Implemented Services using Core Java.
Developed and deployed UI layer logics of sites using JSP.
Used built-in/custom Interceptors and Validators of Struts.
Involved in the complete life cycle of the project from the requirements to the production support.

Environment: J2EE, JDBC, Java, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic, XML and JUnit.

Confidential

Java Developer

Responsibilities:

Developed User Interfaces module usingJSP,JavaScript, DHTML and form beansfor presentation layer.
Developed Servlets and Java Server Pages (JSP).
Developed PL/SQL queries, and wrote stored procedures andJDBC routines to generate reports based on client requirements.
Enhancement of the System according to the customer requirements.
Involved in the customization of the available functionalities of the software for an NBFC (Non-BankingFinancialCompany).
Involved in putting proper review processes and documentation for functionality development.
Providing support and guidance for Production and Implementation Issues.
Used Java Script validation in JSP.
UsedHibernateframework to access the data from back-end SQL Server database.
Used AJAX (Asynchronous JavaScript and XML) to implement user friendly andefficient client interface.
UsedMDBfor consuming messages from JMS queue/topic.
Designed and developed Web Application usingStrutsFramework.
ANT to compile and generate EAR, WAR, and JAR files.
Created test case scenarios for Functional Testing and wrote Unit test cases with JUnit.
Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.

Environment: Java SE 6, Servlets, XML, HTML, JavaScript, JSP, Hibernate, Oracle 11g, SQL Navigator.

We provide IT Staff Augmentation Services!

Data Engineer (spark And Scala)/hadoop Admin Resume

Raleigh, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship