Big Data Engineer Resume
St, LouiS
SUMMARY:
- 6 years of total IT experience, with over 2 years of experience in all phases of Hadoop and 4+ years of experience in development and support of database applications using SQL/PLSQL.
- Functional Experience working in domains of Insurance Services and Financial Data Services
- Expertise in data management and implementation of Big Data applications using Hadoop framework.
- Involved in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Excellent understanding / knowledge of Hadoop architecture and various components such as Hadoop 2.0+, HDFS, MapReduce, YARN, hive 0.12+, HBase 0.98+, Sqoop 1.4.2+, Spark 1.4.0+, Kafka 0.8.1+, Zookeeper 3.4+ and Oozie 3.3+
- Hands on experience in AWS environment using Kinesis Firehouse, Kibana, Elastic search
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Hands on experience with the Spark Core 1.6 , Spark SQL 1.6 and Spark Streaming 1.6 for complex data transformations using SCALA.
- Hands on Experience on multithreading and Akka Actors with play API
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs in Java.
- Extending Hive and Pig core functionalities by writing custom UDFs.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Familiar with cluster management and Security with Kerberos.
- Excellent understanding/knowledge in Core Java, Data Structures, Algorithms, Object Oriented Design (OOD) and Design Pattern
- Experienced in Database development, ETL, OLAP and OLTP
- Hands on experience in automation testing using selenium
- Worked on Build Management tools like SBT , Maven and Version control tools like Git .
- Experience in working with Continuous Integration (CI) tools like Jenkins .
- Worked in a Test-driven development (TDD) environment using Junit, ScalaTest, Hive Runner.
- Experience in Agile Engineering practices.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Tools: Jenkins (Continuous Integration), SVN, Git (Version Control), JIRA
Big Data Ecosystems: Apache Hadoop 2.6.0, MapReduce v2, HDFS, Hive 0.14.0, Pig, HBase, Zookeeper, Sqoop 1.4.6, Kafka, Oozie, Impala
Spark Components: Spark Core 1.6.0, Spark SQL, Spark Streaming
Cloudera Distributions: CDH 5.8
Databases: Oracle, NoSQL (MongoDB, Cassandra, HBase)
Languages: Scala 2.11, Java 8, C++, Perl, Shell Scripting
Web Development: HTML, CSS, XML, JavaScript
Cloud Platform: Amazon Web Services (EC2, S3)
Methodologies: Agile Scrum
Algorithms
- Proficiency in programming with numerical algorithm packages like numpy, scipy, and OpenCV, NLTK.
- Experienced in Machine Learning and Statistical Analysis with Python, Matlab
- Algorithm including K-Means, K-nearest neighbors, Linear/Logistic Regression, HMM, SVM, Neural Network
- Skilled at Data Visualization with Tableau, matplotlib, Matlab
PROFESSIONAL EXPERIENCE:
Confidential, St Louis
Role: Big data Engineer
Responsibilities:
- Developed real-time data pipelines with Kafka 0.8 to receive data from various financial services.
- Configured Spark Streaming 1.6 with Kafka for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
- A developed Scala 2.11 script, UDF’s using both Dataframes/SQL/Datasets and RDD/MapReduce in Spark for Data Aggregation and queries.
- Integrated HBase with Hive 0.13 and wrote HiveQL for data transformations.
- Transferred data from Hive to Tableau and created visualization reports for Business Intelligence requirements.
- Used python to write machine learning algorithm to justify the threshold and make decision.
- Developed workflow in Oozie 3.3 to automate the tasks of loading data and running scheduled batch jobs.
- Implemented NLTK library using python and Java for data modelling and linear regression
- Performed unit testing for Spark and Spark Streaming with ScalaTest and JUnit
- Used SVN for version control, JIRA for project tracking and Jenkins for continuous integration.
- Automated shell scripts for collecting logs related to cluster.
Environment: IBM Soft layer, MapReduce, Hive, NoSQL, Flume, Kafka, REST API, Scala and Java EE, Spring, Cassandra, Ibatis, Oracle 11g, JIRA, HiveRunner, MRUnit, pigUnit, SVN, Junit, ScalaTest, Docker
Confidential, St Louis
Role: Bigdata Engineer
Responsibilities:
- Developed data pipelines with Flume 1.5 to ingest data from various upstream into HDFS in the form of logs.
- Implemented Kafka 0.8 with Flume to Import real time streaming of data into Hbase
- Designed and developed Databases with Hive 0.13, Hbase integrated Hive, customized UDF’s and Hive QL for faster data processing and analytics.
- Used Sqoop to import data between Hive and Relational database.
- Implemented Security Models on the
- Configured spark streaming 1.6 with Hive for real time data analytics as per the business requirements.
- Implemented XML to Avro file conversion modules to transform the data.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Developed workflow in Oozie 3.2 to automate the tasks of loading data and running scheduled batch jobs.
- Performed unit testing for Hive with HiveRunner, MRUnit.
- Used Git for version control, JIRA for project tracking and Jenkins for continuous integration.
Environment: AWS, EMR, EC2, S3, Spark, Scala, Java 8, Spark Streaming, Junit, Scalacheck, Kafka, Flume, Hbase, Hive, Tableau, Oozie, Git, JIRA, Jenkins
Confidential
Role: Programmer Analyst
Responsibilities:
- Coded Java Maven module to perform various Amazon S3 Operations.
- Written custom MapReduce program to cleanse and enrich data in EMR.
- Load transformed data back to S3.
- Load data to Marklogic server in AWS.
- Coded Java script for search operations to build search app with Marklogic.
- Querying for keyword search with Facets number of years, URI’s against skillset.
- Coded Python module for URI based analysis.
- Build a Python Dictionary for predefines keyword from URI crawling data.
- Build a Faceted Search on top of data stored in Marklogic.
Environment: Amazon EMR, S3, Java, Java Script, Python, Marklogic-NoSQL
Confidentia, Hartford
Role: Programmer Analyst
Responsibilities:
- Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using Oracle DB, MY SQL and PL/SQL.
- Implemented & maintained the branching and build/release strategies utilizing SVN.
- Lead configuration management and workflow development efforts for the development team.
- Automated the process of testing using selenium .
- Developed POC for the load balance testing process for the application.
- Tested the application by creating the macros for a large set of information.
- Strong understanding of JAVA project structures.
- Experience in using HTML, CSS in the development of Prototypes for the application.
- Good understanding of building the GUI using the struts 2.3 .
- Building pre-install scripts using Shell scripting and load balance testing.
- Monitoring the application and supporting the production environment.
- Eliminated the bugs using the concepts of lean six Sigma- Five why’s .
- Build Binaries using C++ , Perl and GNU Make .
- Responsibilities include developing complex build, test, provision, secure and deployment systems and providing support to a large community of developers and testers.
Environment: Java/J2EE, XML, Web logic, SQL, PL/SQL, Perl Scripts, Shell scripts, Tomcat Application Server
Confidential
Role: Programmer Analyst
Responsibilities:
- Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using Oracle DB, MY SQL and PL/SQL.
- Solved the production tickets related to the database as well as oracle Forms.
- Handled multiple interface teams and assist them in accessing the dependency functionalities.
- Lead configuration management and workflow development efforts for the development team.
- Automated the process of testing using selenium .
- Developed POC for the load balance testing process for the application.
- Tested the application by creating the macros for a large set of information.
- Strong understanding of JAVA project structures.
- Experience in using HTML, CSS in the development of Prototypes for the application.
- Good understanding of building the GUI using the struts 2.3 .
- Building pre-install scripts using Shell scripting and load balance testing.
- Monitoring the application and supporting the production environment.
Environment: Java/J2EE, XML, Web logic, SQL, PL/SQL, Perl Scripts, Shell scripts, Tomcat Application Server