Sr. Hadoop Developer Resume
Austin, TX
SUMMARY
- Over 9 years of experience in Development, Design, Integration, and Presentation with Java along with 4 years of Big Data /Hadoop experience in Hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS.
- Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS. servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.
- Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
- Technologies extensively worked on during my tenure in Software Development are Struts, Spring, CXF Rest API, Webservices, SOAP, XML, JMS, JSP, JNDI, Apache, Tomcat, JDBC and various Databases like Oracle, and Microsoft SQL server.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Expertise in architecting Big data solutions using Data ingestion, Data Storage
- Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
- Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
- Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
- Good experience in Shell programming.
- Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web.
- Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
- Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
- Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in application development using Java, J2EE, JDBC, spring, Junit.
- Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
- Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Experienced in testing data in HDFS and Hive for each transaction of data.
- Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server and MySQL and proficiency in writing complex SQL queries.
- Experienced in using database tools like SQL Navigator, TOAD.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Hive, Pig, HBase, Map Reduce, Zookeeper, Scala, Akka, Kafka, Storm, Mongo DB, Sqoop, Oozie, FlumeLanguages Java, J2EE, HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.
Operating Systems: Linux, Windows, UNIX, Ubuntu, Centos, Sun Solaris.
No SQL Databases: Mongo DB, Dynamo DB, CassandraWeb Technologies HTML, XML, DHTML, XHTML, CSS, XSLT.
Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.
Frameworks: MVC, Struts, Spring, Hibernate.
Databases: Microsoft Access, MS SQL, Oracle 12c/11g/10g/9i.
AWS: AWS, EC2, S3, SQS.
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Sr. Hadoop Developer
Responsibilities:
- Involving in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developing Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Migrating existing java application into microservices using spring boot and spring cloud.
- Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
- Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
- Developing simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Working as a part of AWS build team.
- Creating, configure and managing S3 bucket(storage).
- Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
- Importing the data from different sources like HDFS/Hbase into Spark RDD.
- Experiencing with batch processing of data sources using Apache Spark and Elastic search.
- Experiencing in implementing Spark RDD transformations, actions to implement business analysis
- Migrating Hive QL queries on structured into Spark QL to improve performance
- Optimizing MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Working on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Working on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Administration, installing, upgrading and managing distributions ofHadoop, Hive, Hbase.
- Involved in performance of troubleshooting and tuningHadoopclusters.
- Creatinged Hive tables, loaded data and wrote Hive queries that run within the map.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
- Configure deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developing Spark scripts by using Python shell commands as per the requirement to read/write JSON files
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's to read/write JSON files.
- Developing Spark ingestion process to extract 1 TB data on daily basis
- Performing data manipulations using various Talend components like tMap, tJavarow, tjava,tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Hortonworks, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL & Talend..
Confidential, Tampa, FL
Sr. Hadoop Developer
Responsibilities:
- Worked on a live 52 nodeHadoopCluster running Hortonworks Data Platform (HDP 2.2).
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Engineered ETL standards forHadoopData Pipelines and Automated end to end data ingestion using Falcon, Sqoop and Oozie.
- Led design and implementation of Store Traffic Data Analysis - an ETL solution to consolidate customer traffic data, sales data and employee workforce data to compute store close rate for about 1800 stores on a daily basis.
- Migrated Existing MapReduce programs to Spark Models using Python. Develop predictive analytic using Apache Spark Scala APIs.
- Implemented various Data Quality rules to ensure traffic data meets quality standards as outlined by analytics stakeholders.
- Implemented Apache Storm Spouts, bolts to process data by creating topologies.
- Developed Imputation Models in Java using Apache Crunch to substitute values for missing or improper traffic data.
- Realized various initiatives from Apache Software Foundation, vetted new frameworks and built Proof of Concepts.
- Developed workflows to cleanse and transform raw data into useful information to load it to a Kafka Queue to be loaded into HDFS and noSQL database.
- Developed Sqoop Jobs to both import data into HDFS from Relational Database Management System like Teradata & DB2 and export data from HDFS to Teradata.
- Developed workflows for complete end to end ETL process starting with getting data into HDFS, validating and applying business logic, storing clean data in hive external tables, exporting data from hive to RDBMS sources for reporting and escalating and data quality issues.
- Built scalable distributed data solutions usingHadoop. Developed MapReduce jobs written in Java to apply the business logic.
- Developed Pig functions to preprocess the data for analysis. Developed Spark scripts by using Scala shell commands as per the requirement.
- Created Oozie workflows to sqoop the data from source to HDFS and then to target tables.
- Created HBase tables to store different formats of data as a backend for user portals.
- Analyzed system failures, identified its root cause and recommended course of actions.
- Functioned as the point of contact for tracking issues and communicating it to the vendors and all other stakeholders. Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Developed utilities in Python to be used by ingestion workflows as part of Data Ingestion Process.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed customized classes for serialization and Deserialization in Hadoop.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
ConfidentialSystems Analyst
Responsibilities:
- Understanding and analyzing business requirements to develop a credit check module using Servlets and JSP & Core Java components in Web logic application server.
- Developed new screens/menu depending on the business requirements.
- Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML.
- Extensively worked with XSD, XSL/XSLT, and XML to navigate in XML documents, and SAX to process and parse xml files.
- Used JUnit Framework for the unit testing of all the java classes and performed system and Integration testing.
- Worked on AJAX implementation for retrieving the content and display it without reloading the existing page.
- Enhancing the existing functionality to improve performance and bug fixing.
- Gathered requirements, created user stories for the Business Requirement Document and prepared a Functional Specification document.
- Provided round the clock on call support.
Environment: Java 1.6, JDBC, XML, AJAX, Oracle, Microsoft Office 2007, MS Outlook 2007, SharePoint.
ConfidentialSoftware Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC)
- Architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used.
- Gathered business requirements and wrote functional specifications and detailed design documents.
- Extensively used Core Java, Servlets, JSP and XML.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
- Implemented Enterprise Logging service using JMS and apache CXF.
- Developed Unit Test Cases, and used JUNIT for unit testing of the application.
- Implemented Framework Component to consume ELS service.
- Implemented JMS producer and Consumer using Mule ESB.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Designed Low Level design documents for ELS Service.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Development carried out under Eclipse Integrated Development Environment (IDE).
- Used JBoss for deploying various components of application.
- Involved in Unit testing, Integration testing and User Acceptance testing.
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, spring core, JBoss, JUNIT, JMS, JDK, SVN, Maven, Servlets, JSP and XML.
