Hadoop Developer / Spark Developer Resume
Houston, TX
SUMMARY:
- Overall 8+ years of IT experience in analysis, design, development and implementation of business applications with thorough knowledge in Java, J2EE, Big Data, Hadoop Eco System and RDBMS related technologies.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
- Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and another source.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO) .
- Proficiency in Hadoop data formats like AVRO & Parquet.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on Hive.
- Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB .
- Proficient in implementing Hive .
- Used Zookeeper to provide coordination services to the cluster.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka .
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
- Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
- Extensive experience using MAVEN and ANT as a Build Tool for the building of deployable artifacts from source code.
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager.
- Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR) .
- Proficient in using OOPs Concepts (Polymorphism, Inheritance, Encapsulation) etc.
- Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL .
- Excellent interpersonal skills, good experience in interacting with clients with good team player and problem solving skills.
- Strong knowledge in development of Object Oriented and Distributed applications.
- Written unit test cases using JUnit and MR Unit for Map Reduce jobs.
- Good understanding of Hadoop v1/v2 architecture and hands-on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node, Map Reduce concepts and YARN architecture which includes Node manager, Resource manager and App Master.
- Knowledge about Splunk architecture and various components indexer, forwarder, search head, deployment server, Heavy and Universal forwarder, License model.
- Knowledge in machine learning (Linear Regression, logistic regression, Clustering, Classification, and Decision Tree, support vector machines and dimensionality reduction).
- Ability to quickly master new concepts and applications.
TECHNICAL SKILLS:
Big Data Skillset Frameworks & Environments: Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0, HDFS, MapReduce, Pig, Hive, Impala, HBase, Data Lake, Cassandra, MongoDB, Mahout, Sqoop, Oozie, Zookeeper, Flume, Splunk, Spark, Storm, Kafka, YARN, Falcon,Avro.
JAVA & J2EE Technologies:: JSP, Java Beans, Servlets, EJB
Programming & Scripting Languages: Java, C, MySQL, SQL Server, Python, Linux Shell Scripts, Impala, Scala, C++
Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Hibernate, Struts.
Web Technologies: HTML, JavaScript
Web Servers: Apache Tomcat
Databases & Application Servers: MS SQL Server & SQL Server Integration Services (SSIS), My SQL, MS Access, MongoDB, Cassandra, Oracle 8i, 9i, 11i & 10g, Teradata, IBM WebSphere, JBoss WebLogic
IDEs: Eclipse, Net Beans, Intellij
AWS: S3, EC2, EMR
Operating System: Unix, Windows, Linux, Cent OS, Mac OS
Others: Putty, WinScp, Talend, Tableau, GitHub
PROFESSIONAL EXPERIENCE:
Confidential, Houston, TX
Hadoop developer / Spark Developer
Responsibilities:
- Developed procedures in Oracle PLSQL/TSQL, extracted the data from MySQL into HDFS using Sqoop.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Created Hive tables to store the processed results in a tabular format.
- Implemented business logic based on state in Hive using Generic UDF's.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & SPLUNK.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented various MapReduce Jobs in custom environments and updating them to Hbase tables by generating hive queries.
- Explore with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experience over Kafka and Storm are used for real time analytics and AML, which used for data analytics.
Confidential, GA
Hadoop/ETL Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large data sets by running Hive Queries and Pig scripts.
- Involved in creating Hive tables, loading and analyzing data using Hive Queries.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Developed simple to complex MapReduce jobs.
- Load and transform large sets of structured, semi structured and unstructured data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install Hadoop updates, patches and version upgrades as required.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance for Hive and Pig queries.
- Developed UNIX Shell scripts to automate repetitive database processes.
Confidential, New York NY
Hadoop Developer
Responsibilities:
- Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
- Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
- Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Performed debugging and fine tuning in Hive & Pig for improving performance.
- Used Oozie operational services for batch processing and scheduling workflows dynamically.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Performed Map side joins on data in Hive to explore business insights.
- Involved in forecast based on the present results and insights derived from data analysis.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
- Participated in team discussions to develop useful insights from big data processing results.
- Suggested trends to the higher management based on social media data.
Confidential
Java Developer
Responsibilities:
- Interact with Business Users and Develop Custom Reports based on the criteria defined.
- Requirement gathering and information collection. Analysis of gathered information so as to prepare a detail work plan and task breakdown structure.
- Designing and documenting high-level project document for approval and record purposes.
- Involved in the phases of SDLC(Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application
- Worked on Enhancement requests in front-end and back-end changes using Servlets, Tomcat server, JDBC, Hibernate
- Used SQL queries for database integration with the code
- Creation of test plans and test data for modified programs and logging the test documents in QC
- End - to - End System development and testing of each modules.(Unit &System integration)
- Co-ordination activities with Onshore and Offshore team of 10+ members
- Responsible for Effort estimation and timely production deliveries
- Creation and Execution of half yearly and yearly load jobs which updates new rate and discounts etc for the claim calculations in Database and Files
- Rewarded appreciations from client on proposing and implementing paging logic of Glossary in Explanations of Benefits(EOB) to print on the previous page which saved huge money and added profit to client
- Participated in Hadoop Training for Development and Admin as a Cross-platform training program
ENVIRONMENT: Java, J2EE, SQL, Servlets, XML, Hibernate, Eclipse, Git, JUnit, JDBC, Tomcat server
