Sr. Big Data Engineer Resume
SUMMARY
- Experienced in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
- Expertise in core Java, JDBC and proficient in using Java API's for application development and experience includes development of web based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle.
- Good Knowledge and experience in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
- Good knowledge on AWS data services including Kinesis, Athena, Redshift and EMR.
- Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls and good working experience in Application and web Servers like JBoss and Apache Tomcat.
- Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions and leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
- Excellent hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools and experience in installation, configuration, supporting and managing Hadoop clusters.
- Strong hands on experience with AWS services, including but not limited to EMR, S3, EC2, route 53, RDS, ELB, Dynamo DB, Cloud Formation, etc.
- Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies.
- Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
- Very good experience and knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, Netbeans.
- Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
- Experience in working with different data sources like Flat files, XML files and Databases.
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle and experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Have very good knowledge on Splice Machine.
TECHNICAL SKILLS
Big Data Ecosystem: MapReduce, HDFS, HIVE, Pig, Sqoop, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks and Athena.
Databases: Oracle 12c/11g, MySQL, MS-SQL, Teradata, HBase, MongoDB, Cassandra
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
NoSQL Databases: HBase, Cassandra and MongoDB
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Web Technologies: JavaScript, CSS, HTML, JASON and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, IntelliJ, Toad and Netbeans
Other Technologies: Teradata, tableau
PROFESSIONAL EXPERIENCE
Confidential
Sr. Big data Engineer
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
- Managed and lead the development effort with the help of a diverse internal and overseas group and design/ architected and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity.
- Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need and involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Performed data profiling and transformation on the raw data using Pig, Python, and Java and developed predictive analytic using Apache Spark Scala APIs.
- Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake without having to go through ETL process.
- Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API s and Spring Boot for Microservices.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra) and responsible for importing log files from various sources into HDFS using Flume
- Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Designed and Developed Real Time Stream Processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation and use Apache Airflow to schedule and run the airflow dags to execute code.
- Involved in working of big data analysis using Pig and User defined functions (UDF) and created Hive External tables and loaded the data into tables and query data using HQL.
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts and involved in scheduling Airflow workflow engine to run multiple Hive and pig jobs using python.
- Used Athena for better performance by compressing, partitioning, and converting your data into columnar formats.
- Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
- Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
- Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/ Hive & Impala
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability and worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Used ETL methodology for supporting data extraction, transformations and loading processing, in a complex MDM using Informatica.
- Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP system, Fact and Dimension tables, Conceptual, Logical and Physical data modeling using Erwin r9.6.
- Have done POC on AWS Athena service and used Athena to run ad-hoc queries using ANSI SQL.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice
Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, ETL, AWS S3, AWS Glue, Oozie, Zookeeper, SQL, Spring Boot, Athena, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.
Confidential
Sr. Hadoop Developer
Responsibilities:
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Developed Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
- Developed scripts to automate the workflow of various processes using python and shell scripting.
- Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
- Collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/Map Reduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Designed and developed Java EE software components using Spring Boot and RESTful web service and implemented Micro Services using Spring Boot, Spring Cloud, Spring Data, Micro-Services.
- Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive and used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Developed Map-Reduce programs using java and python to parse the raw data and store the refined data in Hive.
- Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
- Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
- Used UDF's to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
- Created MDM, OLAP data architecture, analytical data marts, and cubes optimized for reporting and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snow Flake Schema.
- Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS using Flume.
- Developed an end-to-end workflow to build a real time dashboard using Kibana, Elastic Search, Hive and Flume.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Developed Python Map Reduce programs for log analysis and Designed Algorithm for finding the fake review by using python.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Extracted the data from MySQL, AWS Red Shift into HDFS using Sqoop and Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS and AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- Managed multiple ETL development teams for business intelligence and Master data management initiatives.
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce and developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Using Oozie for designing workflows and scheduling various jobs in the Hadoop ecosystem.
- Developed Map Reduce programs in java for applying business rules on the data and optimizing them using various compression formats and combiners.
- Using SparkSQL to create data frames by loading JSON data and analyzing it and developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Spring Boot, Dynamo DB, PL/SQL and Python.
Confidential, TX
Sr. Java/J2EE Developer
Responsibilities:
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
- Developed the J2EE application based on the Service Oriented Architecture and used Design Patterns like Singleton, Factory, Session Facade and DAO.
- Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Worked with EJB (Session and Entity) to implement the business logic to handle various interactions with the database.
- Skilled in using collections in Python for manipulating and looping through different user defined objects.
- Implemented a high-performance, highly modular, load-balancing broker in C with Zero MQ and Redis.
- Used spring and Hibernate for implementing IOC, AOP and ORM for back end tiers and created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.
- Part of team implementing REST API's in Python using micro-framework like Flask with
- Used Spring Inheritance to develop beans from already developed parent beans and used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used SOAP Lite module to communicate with different web-services based on given WSDL.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing and used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Modified the Spring Controllers and Services classes so as to support the introduction of spring framework.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end and developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS and used Aptana Studio and Sublime to develop and debug application code.
- Skilled in using collections in Python for manipulating and looping through different user defined objects.
- Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code and created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Used Log4j utility to generate run-time logs and Wrote SAX and DOM XML parsers and used SOAP for sending and getting data from the external interface.
- Deployed business components into WebSphere Application Server and developed Functional Requirement Document based on users' requirement.
Environment: Core Java, J2EE, JDK 1.6, Python, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.
Confidential, TX
Java Developer
Responsibilities:
- Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
- Gathered the requirements and designed the application flow for the application.
- Used HTML, JavaScript, JSF 2.0, AJAX and JSP to create the User Interface. • Involved in writing Maven for building and configuring the application.
- Developed Action classes for the system as a feature of Struts and performed both Server side and Client Side Validations.
- Developed EJB component to implement business logic using Session and Message Bean.
- Developed the code using Core Java Concepts Spring Framework, JSP, Hibernate 3.0, JavaScript, XML and HTML.
- Used Spring Framework to integrate with Struts web framework, Hibernate.
- Extensively worked with Hibernate to connect to database for data persistence and integrated Activate Catalog to get parts using JMS.
- Used Log4J log both User Interface and Domain Level Messages.
- Extensively worked with Struts for middle tier development with Hibernate as ORM and Spring IOC for Dependency Injection for the application based on MVC design paradigm.
- Created struts-config.xml file to manage with the page flow and developed html views with HTML, CSS, and Java Script.
- Performed Unit testing for modules using Junit and played an active role in preparing documentation for future reference and upgrades.
- Implemented the front end using JSP, HTML, CSS and JavaScript, JQuery, AJAX for dynamic web content.
- Worked in an Agile Environment used Scrum as the methodology wherein I was responsible for delivering potentially shippable product increments at the end of each Sprint.
- Involved in Scrum meetings that allow clusters of teams to discuss their work, focusing especially on areas of overlap and integration.
Environment: Java 1.4, JSP, Servlets, Java Script,, HTML 5, AJAX, JDBC, JMS, EJB, Struts 2.0, Spring 2.0, Hibernate 2.0, Eclipse 3.x, WebLogic9, Oracle 9i, Junit, Log4j