Hadoop Developer Resume
Plano, TX
SUMMARY:
- 7+ years of professional experience in IT in Analysis, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications.
- Qualified Hadoop developer with experience in Hadoop, database management system architecture, Java core, Testing and Implementing Big Data.
- Good experience in developing and implementing big data solutions and data mining applications on Hadoop using HDFS, Map Reduce, Hbase, Pig, Hive, Sqoop, Flume, Kafka, Strom, Spark, Oozie, Zookeeper.
- Strong Experience in analyzing data using HiveQL, Pig Scripts and custom Map Reduce programs in Java.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
- Experience in importing and exporting the data using Sqoop and Flume from HDFS to Relational Database System and vice - versa
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Proficient in processing data by using TEZ job.
- Expertise in Real time data ingestion into HBASE and HIVE using Storm.
- Expertise in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Good experience in loading unstructured data into HDFS using Flume/Kafka.
- Excellent experience in dealing with Compression Codecs like Snappy, Gzip.
- Expertise in managing and reviewing Hadoop Log files.
- Hands on experience in in-memory data processing with Apache Spark.
- Hands on experience in data cleaning, transformation and pushing data as delimited files into HDFS using Informatica Developer.
- Worked on ETL tools like Talend to extract, transform and load data according to the requirement.
- Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
- Implemented ELK (ElasticSearch, Logstash, Kibana) stack to collect and analyze the logs produced by the Storm cluster
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Excellent communication skills, interpersonal skills, problem solving skills, a very good team player along with extremely strong positive attitude.
TECHNICAL SKILLS:
Hadoop/Big Data: Hadoop 1.x/2.x (Yarn), HDFS, Map Reduce, Spark, Hive, Zookeeper, Oozie, Tez, Pig, Sqoop, Flume, Kafka, Storm, Ganglia, Nagios.
Development Tools: Eclipse, IBM DB2 Command Editor, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access).
Programming/Scripting Languages: Java, SQL, Unix Shell Scripting, Python.
Databases: Oracle 11g,10g,9i, MySQL, PL/SQL, SQL Server 2005,2008 & DB2
NoSQL Databases: HBase, Cassandra, Mongo DB
ETL: Informatica, Talend
Web Tools: HTML, JavaScript, XML,XSL,DOM
Methodologies: Agile/ Scrum, Waterfall
Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux and Solaris.
Monitoring & Reporting Tools: Ganglia, Nagios, Custom shell reports
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Hadoop Developer
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Developing MapReduce programs to parse the raw data and store the refined data in tables.
- Injecting, analyzing, processing the data and storing results into HDFS, Hive/HBase using Sqoop.
- Responsible for managing data from various sources and their metadata using Hive .
- Working with Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources.
- Involved in extracting data from various data sources into HDFS . Used Sqoop to efficiently transfer data between RDBMS and HDFS, used Flume to stream log data from servers.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Altered existing Scala programs to enhance performance and obtain partitioned results.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developing Spark code in Scala and Spark-SQL environment for faster testing and processing of data.
- Exporting analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
- Involved in loading data into Cassandra NoSQL Database
- Working with Oozie to automate the flow of jobs and coordination in the cluster respectively.
Environment: Hadoop 0.20.2, Hive, Hbase, Apache Sqoop, Scala, PIG, Spark, Oozie, Cassandra,Cloudera manager.
Confidential, Jersey City, NJHadoop Developer
Responsibilities:
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Worked on Large-scale Hadoop cluster for distributed data processing and analysis using Sqoop, Hive, Pig and MapReduce .
- Imported data to HDFS from different databases and exported the processed data to Hive, HBase and RDBMS using Sqoop .
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Optimized MapReduce algorithms using Combiners and Partitions to ensure best results.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS .
- Loading data in to NoSQL database HBase using Pig .
- Developed a robust data-pipeline to cleanse, filter, aggregate, normalize, and de-normalize the data using Apache Pig and Spark.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
- Coordinated the cluster services using ZooKeeper .
- Developing workflow in Oozie to automate the tasks of loading the data into HDFS .
- Actively participated in collection, analysis and design of the requirements to meet the clients criteria.
- Maintained System integrity of all subcomponents (primarily HDFS, Map Reduce, HBase, and Flume).
- Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes.
Environment: HDFS, Hive, Pig, Sqoop, Spark, ZooKeeper, Oozie
Confidential, Phoenix, AZHadoop Developer
Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed MapReduce programs that filter bad and unnecessary claim records and find out unique records based on account type.
- Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL, Splunk and Spark streaming.
- Used Sqoop to import data into HDFS from MySQL database and vice-versa.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Hbase .
- Extensive experience in writing Pig scripts to transform raw data from several big data sources into forming baseline big data.
- Configured Flume to extract the data from the web server output files to load into HDFS .
- Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
- Involved in creating Hive Tables, loading the data and writing Hive Queries that will run internally in a map reduce way.
- Responsible for importing and exporting data into HDFS from Oracle Database, and vice versa using Sqoop .
- Extensively worked with Partitions, Bucketing tables in Hive and designed both Managed and External table.
- Created and worked with Sqoop jobs with full refresh and incremental load to populate Hive External tables.
- Designing and creating Oozie workflows to schedule and manage Hadoop, Hive, pig and sqoop jobs.
Environment: Hadoop, MapReduce, Pig, Hive, Spark, Splunk, Hbase, HDFS, MySQL, Sqoop, Flume, Oozie.
Confidential, Nashville, TNETL Developer
Responsibilities:
- Understanding the design Requirements.
- Analyzed business process workflows and assisted in the development of ETL procedures for moving data from source to target systems.
- Extensively used ETL to data transfer from different sources like flat files, .csv, XML, VSAM and load the data into the target staging database.
- Designed and implemented appropriate ETL mappings to extract and transform data from various sources to meet requirements.
- Extensively used Informatica Transformations like Source Qualifier, Rank, SQL, Router, Filter, Lookup, Joiner, Aggregator, Normalizer, Sorter etc. and all transformation properties.
- Created Sessions, Workflows, Post Session email task and also performed various workflow monitoring and scheduling tasks.
- Used Informatica Designer to create reusable transformations to be used in Informatica mappings and mapplets.
- Developed slowly changing dimension, according to the data mart schemas.
- Involved in identifying the sources for various dimensions and facts for different data marts according to star schema design pattern.
- Involved in Fine-tuning of sources, targets, mappings and sessions for Performance Optimization.
- Monitored sessions using the workflow monitor, which were scheduled, running, completed or failed. Debugged mappings for failed sessions.
Environment: Informatica Power Center 8.5/8.6.1, Oracle10g, Windows.
Confidential, Atlanta, GAJava Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
- Designed and developed the Application based on J2EE Architecture for server side on Spring MVC Framework.
- Involved in analysis, design and developing front end/UI using JSP, HTML, DHTML and JavaScript.
- Prepared workflow diagrams using MS VISIO and modeled the methods based on OOPS methodology.
- Data Migration from Flat files, CSV, MS-Access, Excel and OLE DB to SQL Database.
- Accountable to guide projects on the design and execution of data quality initiatives and other data performance measures following data quality programs.
- Developed the Host modules using C++, DB2 and SQL .
- Responsible for creating the front-end code and java code to suit the business requirement.
- Installed, configured and administered Web Logic Application Server and deploy JSP, Servlets and EJB applications.
- Written Maven scripts for build, unit testing, deployment, check styles etc.
Environment: Java, J2EE, JDK, JSP, Eclipse, Maven, HTML, Servlets, SQL, DB2.
ConfidentialJava Developer
Responsibilities:
- Developed front end screens which includes JQuery, JavaScript, Java and CSS.
- Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
- Designed and developed the Application based on J2EE Architecture for server side on Spring MVC Framework.
- Involved in development and enhancement of web client. Involved in enhancements and optimization in Business logic.
- Developed web-based user interfaces using struts frame work.
- Designed the GUI screens using Struts and Configured log4j to debug the Application.
- Involved in the development of test cases for the testing phase.
- Performed End to end integration testing of online scenarios and unit testing using JUnit Testing Framework.
Environment: Java, J2EE, JavaScript, JSP, JSF, Oracle, Eclipse, Log4j