Hadoop Developer Resume
Charlotte, NC
SUMMARY
- Over 8+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.
- Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.
- Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric road map and analytics capabilities.
- Experienced in Hadoop Architect and Technical Lead role, provide design solutions and Hadoop architectural direction
- 4+ years of industrial experience in Data manipulation, Big Data analytics using Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Cassandra, Avro, Solr and Zookeeper.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Hands on experience in developing SPARK applications using Spark API's like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
- Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
- Expertise in developing Pig Latin scripts and using Hive Query Language.
- Developed Customized UDFs and UDAF's in java to extend HIVE and Pig core functionality.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Worked on GUI Based Hive Interaction tools like Hue, Karma sphere for querying the data.
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
- Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib (Spark).
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge in installing and maintaining Cassandra by configuring the Cassandra. yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
- Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
- Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE.
- Good knowledge on build tools like Maven, Griddle and Ant.
- Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
- Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
- Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX and had a bird's eye view on React Java Script Library.
TECHNICAL SKILLS
- APACHE(7 years)
- APACHE HADOOP HDFS(6 years)
- APACHE HADOOP SQOOP(6 years)
- APACHE HBASE(6 years)
- Hadoop(6 years)
- Apache Hadoop 2.0.0
- Pig 0.11
- Hive 0.10
- Sqoop 1.4.3
- Flume
- MapReduce
- JSP
- Structs2.0
- NoSQL
- HDFS
- Teradata
- Sqoop
- LINUX
- Oozie
- Cassandra
- Hue
- HCatalog
- Java. IBM Cognos
- Oracle 11g/10g
- Microsoft SQL Server
- Microsoft SSIS
- DB2 LUW
- TOAD for DB2
- IBM Data Studio
- AIX 6.1
- UNIX Scripting(8 years)
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential - Charlotte, NC
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Understanding business needs, analyzing functional specifications and map those to develop.
- Involved in loading data from Mainframe DB2 into HDFS using Sqoop.
- Handled Delta processing or incremental updates using Hive.
- Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system.
- Developed PIG Latin scripts in transformations while extracting data from source system.
- To work on data issue related tickets and to provide the fix.
- To monitor and fix the production job failures.
- Review the team members design documents and coding.
- Documented the systems processes and procedures for future references including design and code reviews.
- Involved in story - driven agile development methodology and actively participated in daily scrum meetings.
- Implemented data ingestion from multiple sources like IBM Mainframes, Oracle.
- Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Have a thorough knowledge of spark architecture and how RDD's work internally.
- Have exposure to Spark SQL.
- Have experience in Scala programming language and used it extensively with Spark for data processing.
Environment: HDFS, Hive, Pig, HBase, Unix Shell Script, Talend, Spark, Scala.
Hadoop Developer
Confidential - Chadds Ford, PA
Responsibilities:
- Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Built APIs that will allow customer service representatives to access the data and answer queries.
- Designed changes to transform current Hadoop jobs to HBase.
- Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Developed Spark Application by using Scala.
- Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
- Develop database management systems for easy access, storage, and retrieval of data.
- Perform DB activities such as indexing, performance tuning, and backup and restore.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
- Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently.
- Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
- Implemented AJAX, JSON, and Java script to create interactive web screens.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Support of applications running on Linux machines
- Developed data formatted web applications and deploy the script using HTML5, XHTML, CSS, and Client- side scripting using JavaScript.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
- Used Zookeeper to manage coordination among the clusters.
- Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suits the current requirements.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required
- Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Cassandra, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting
Hadoop Developer
Confidential - Boston, MA
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre - processing.
- Coordinated with business customers to gather business requirements.
- And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in the Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Load and Transform large sets of structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Cloudera.
- The created Data model for Hive tables.
- Involved in Unit testing and delivered Unit test plans and results in documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: HTML5, SCSS, CSS3, Mix Panel, Mustache, Glyph icons, Bootstrap, AngularJS, Spring AOP, Hibernate, Promises, Bower, NPM, React.js, Redux, NET, AWS, RESTful, Nodejs
Hadoop Developer
Confidential - St. Louis, MO
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, Sqoop & Spark.
- Developed Spark code using Scala for faster processing of data.
- AGILE development methodology has been followed to develop the application.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
- Integrated Apache Storm with Kafka to perform web analytics.
- Uploaded click stream data from Kafka to Hdfs, HBase, and Hive by integrating with Storm
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.
- Created, altered and deleted topics (Kafka Queues) when required with varying
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSQL database such as HBase, MongoDB Involved in cluster maintenance and monitoring.
- Load and transform large sets of structured, semi structured and unstructured data
- Involved in loading data from UNIX file system to HDFS.
- Created an email notification service upon completion of a job or the particular team which requested the data.
- Worked on NOSQL databases which differ from classic relational databases.
- Conducted requirements gathering sessions with various stakeholders
- Involved in knowledge transition activities to the team members.
- Successful in creating and implementing complex code changes.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch
- Experience in S3, Cloud Front and Route 53.
Environment: Hadoopv2/Yarn-2.4, Spark, AWS, MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Pig, Cloudera, Kafka, SSRS.
Java Developer
Confidential
Responsibilities:
- Involved in requirements collection & analysis from the business team.
- Created the design documents with use case diagram, class diagram, and the sequence diagrams using rational rose.
- Implemented the MVC architecture using Apache Struts framework.
- Implemented Action Classes and server - side validations for account activity, payment history, and transactions.
- Implemented views using struts tags, JSTL and Expression Language.
- Implemented session beans to handle the business logic for fund transfer, loan, credit card & fixed deposit modules.
- Worked with various Java patterns such as singleton and factory pattern at the business layer for effective objective behavior.
- Worked on the Java collections API for handling the data objects between the business layers and the front end.
- Developed unit test cases using JUnit.
- Developed ant scripts and developed builds using Apache ANT.
- The used clear case for source code maintenance.
Environment: J2EE1.4, Java, Tiles, JSP1.2, Java Mail, Clear Case, ANT, JavaScript, JMS
Java Developer
Confidential
Responsibilities:
- Involved in design and implementation of server - side programming.
- Involved in gathering requirements, analyzed them and prepared high-level documents.
- Participated in all client meetings to understand the requirements.
- Actively involved in designing and data modelling using Rational Rose Tool (UML)
- Involved in the design of the SPACE database.
- Designed and development of User Interfaces, Menus using HTML, JSP, JSP Custom Tag, JavaScript.
- Implemented User Interface using spring tiles framework.
- Developed, Deployed and tested JSP's, Servlets in Web logic.
- Used Eclipse as IDE tool and integrated Web Logic with Eclipse to deploy and develop the applications and JDBC to connect the database.
Environment: Struts Framework, Java 1.3, XML, Data Modelling, JDBC, SQL, Pl/SQL, JMS, Web Services, SOAP, Solaris 9, ANT tool, Toad, Eclipse.
