Big Data Engineer Resume
San Mclean, VA
SUMMARY
- Software Engineer with over 8+ years of professional experience in IT industry with emphasis on Software Development of Financial applications, Telecom projects, Big Data and Web applications. Possess strong understanding of SDLC and methodologies to implement them to achieve better quality results. Possess extensive knowledge of problem solving techniques and excellent analytical skills coupled with good communication and interpersonal skills.
- Over 8+ years’ experience in developing codes and procedures for applications in different environments in Finance, Web & Big Data field.
- Web Application developer in Java & Ruby on Rails with 2 years of academic and professional experience developing client - side interfaces using JavaScript, jQuery, HTML5 and CSS3.
- Experienced Java Developer in Java & J2EE technologies like Spring MVC, JDBC, JSP, Servlets, Hibernate, JMS having experience working with all kinds of Frameworks doing TDD.
- Working experience in IBM® DB2®, UNIX Shell Scripting, Hadoop, HBase Ember.js, HTML5, CSS3, and other JavaScipt Technologies.
- Excellent understanding of REST API, MVC frameworks, Data Warehousing and ETL process.
- Advanced knowledge of NoSQL & RDBMS, SQL queries, Database sharding and partitioning, SQL scripting for data validation while working as a Data Engineer.
- Working knowledge of Software Development process, like Scrum and Agile.
- Advanced knowledge of Software Design Patterns, Big Data Technologies (Apache Hadoop, Flume, Kafka, Storm, Spark & Spark Streaming) and Cloud Technologies & design.
- Working knowledge of Spark Streaming, Batch processing and DataFrames using Spark & SparkSQL.
- Good knowledge of Multithreading, thread synchronization & Spring MVC.
- Technical expertise in Hadoop development, Live Stream processing using Flume, Kafka, Storm & Spark, HDFS, HBase and Hive.
- Good understanding of Hadoop and Spark internals.
- Working experience in building shippable product using Chef and Docker.
- Experience using Maven & Ant build tools.
- Excellent analytical, problem solving and communication skills.
- A quick Learner, task owner and a very good team player.
- Ability to manage the team independently and co-ordinate for successful accomplishment.
- Ability to multi-task and work under pressure in fast passed environment.
- Submitted a proposal on an 'Expertise model' to the Chief Architect of Confidential Inc during Summer Internship and the same was accepted by the company for integrating with the company product. The product was launched last year.
TECHNICAL SKILLS
Operating Systems: MS Windows, Linux (Fedora Core, Red Hat), AIX, Unix, OS X
Programming Languages: C, C++, Java, Ruby + Ruby on Rails, HTML5, CSS3
Scripting Languages: Unix Shell Scripting, Perl Scripting (Basic), JavaScript, Python
RDBMS/SQL: IBM DB2, MySQL, PL/SQL, NoSQL (MongoDB, Cassandra, Riak, Hadoop HBASE, Hive, Pig, Redis), PostGreSQL
IDE: IntelliJ Idea, Eclipse, NetBeans.
App Server: Apache Tomcat, Jboss
Web Services: REST, SOAP
Build tools: Apache Ant, Maven
ORM: JDBC, JPA, Hibernate, Mongoid, Mongomapper
Source Code Versioning: Github
Big Data Technologies: Apache Hadoop 2.4, CDH4.5, Apache Kafka, Spark, Storm and Flume
PROFESSIONAL EXPERIENCE
Big Data Engineer
Confidential, San Mclean, VA
Responsibilities:
- Lead a team of Software Engineers to build MADtech team from the ground up.
- MADtech is Marketing and Decision making team that is responsible for processing large amount of data to generate a set of results to be used by marketing team for making better decisions to market the credit cards to customers.
- Research on Apache Flume, Kafka, Storm & Spark to understand the compatibility and interoperability among these components to build a decision making engine for the MAD team from the scratch.
- Design a real time computation engine using Flume, Kafka, Storm/Spark & Esper; a Complex Event Processing engine to provide the credit card customer with a Credit Line Increase.
- Write scripts to fetch the data from RDBMS, Teradata & multiple other sources and transfer the data using Flume & Kafka to push the stream into Storm topology for processing and then persist the data in MongoDB.
- Write Hadoop MapReduce programs, create Spark Dataframes, DStreams & RDD’s/Storm topologies to process large set of data for better decision-making.
- Perform ETL on data from Dev and QA box for staging it for micro batching to Spark.
- Continuous integration using Jenkins and Github.
- Design RESTful web services using Spring MVC and Hibernate to build an internal portal to process the dataset.
- Build docker images & containers using Docker to ship deployable product.
- Involved in the integration of spring for implementing Dependency Injection (DI/IOC).
- Write complex SQL queries using joins and case statements to retrieve data from Teradata to perform analysis that will be sent for live stream processing using Streaming technologies.
Big Data Hadoop Engineer
Confidential, Mountain View, California
Responsibilities:
- Research on Apache Hadoop 2.4 ecosystem & document necessary features necessary to migrate the hadoop services from ver 1.3 to 2.4
- Write Hadoop services and API’s to migrate the Hadoop distribution from 1.3 to 2.4.
- Test the features of the new hadoop distribution and solve bugs and issues pertaining to it.
- Test the entire Orchestrator product for different Hadoop distributions like CDH4.5, HDP1.3 and Apache Hadoop 1.3, 2.4.
- Used Scrum (Agile) as Development methodology
- Work with JSON documents by fetching data from HBase, Vertica & log files to find patterns of users, work with XML parsers & configuring of XML files like Web.xml, etc.
- Involved in the integration of spring for implementing Dependency Injection (DI/IOC).
- Used Maven Deployment Descriptor Setting up build environment by writing Maven pom.xml, taking build configuration and deployment of the application in all servers.
- Web Services implemented under RESTful Architecture.
- Developed a Message Driven Bean that uses JMS to manage backend transactions.
- Preparation of Test cases on JUnit during the unit testing and system testing phase
- Integration of all the developed modules.
- Design and implement solution that would always be running on SaaS so that customers can access the resources online from any location without any interruption.
Confidential, Mountain View, California
Responsibilities:
- Work on UI development using Ember.js to design the discussion forum for the product and user object modification to decide the expertise level of the user.
- Write server side codes using Java and Ruby and Rails to take requests from the user and respond with appropriate responses to the Ember-data.
- Perform test-driven development using JUnit & Rspec, a modern day approach for writing highly efficient code.
- To perform machine learning to establish the relationship between the posts and comments (spam filtering) and implement the rating feature to decide the expertise level of the user.
- Involved in all the phases of SDLC including Requirements Collection, Design & Analysis of the Customer Specifications, Development and Customization of the Application.
- Used Spring Model View Controller Framework and Achieved Dependency Injection.
- Used Hibernate Framework for ORM Mapping and created mapping classes required.
- Back end coding and development using Java Collections including Set, List, Map, Multithreading, Servlet, Action, Action Forms, Java beans, Exception Handling etc.
- Design Patterns like Façade, Singleton and Factory Pattern were used.
- Deployed the java application using Maven build script
- Write Hadoop Map-Reduce programs to find patterns from the user activity of multiple courses taken by the user and use it to perform machine learning.
- Test existing functionality by creating knowledge packs and knowledge units using different roles and also perform source code versioning using Github.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple mapreduce jobs in PIG and Hive for data cleaning and pre-processing.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Load and Transform large sets of structured and semi structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Cloudera.
- Created Data model for Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: Hadoop, Hive, MapReduce, Pig, SQOOP.
Application Developer
Confidential
Responsibilities:
- Web based reporting and standard analytics that provides card-member oriented business insights to the Merchant.
- Responsibilities: Documents requirements from Project Manager and design high level UML diagrams to depict the flow of the system.
- Write Stored Procedures in Java, UNIX and DB2 to fetch card member information to be displayed in reports.
- Co-ordinate with Datawarehouse teams to extract raw data from Production environment using ETL tools.
- Perform UAT along with Team Leads before delivering the product to the customer.
Confidential
Application Programmer
Responsibilities:
- Design UML diagrams depicting the flow of system and approach to fetch information from the database.
- Design & develop SQL queries and stored procedures to fetch information from the database to perform research on the data to be used by client for promoting offers.
- Write shell scripts to process and update records from the database as per instructions from the BA’s.
- Perform staging of data from the database per week to track changes to the data.
- Develop scripts to run jobs on the Production Database and maintain the scripts for better performance.
- Technologies used: IBM DB2, Java, UNIX Shell Scripting, MySQL, SQL.
Confidential
Member Technical Staff
Responsibilities:
- Design the dashboard using Java Swings
- Write unit test cases
- Do documentation for full life cycle for development
- Manual test cases written in a word document for GUI testing.
Environment: Java swings, Core Java
