Sr. Hadoop Developer Resume
Deerfield, IL
PROFESSIONAL SUMMARY:
- Around 8 years of extensive experience in Information Technology with 6 years of Hadoop/Bigdata processing and 2 years of Java J2EE technologies.
- Comprehensive working experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.
- Experience working on Hortonworks / Cloudera / Map R.
- Excellent working knowledge ofHDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers etc.
- In depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages and task.
- Experience working on Spark and Spark Streaming.
- Hands - on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala and Flume.
- Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, spark, kafka, storm, Zookeeper and Flume
- Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it
- Worked on Performance Tuning to Ensure that assigned systems were patched, configured and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availability
- Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
- Firm grip on data modeling, data marts, database performance tuning and NoSQL map-reduce systems
- Experience in managing and reviewing Hadoop log files
- Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data
- Experience in setting up Hadoop clusters on cloud platforms like AWS .
- Customized the dashboards and done access management and identity in AWS
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
- Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s.
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and PARQUET for Hive querying and processing.
- Proficient in NoSQL databases like HBase.
- Experience in importing and exporting data using Sqoop between HDFS and Relational Database Systems.
- Populated HDFS with vast amounts of data using Apache Kafka and Flume.
- Knowledge in Kafka installation & integration with Spark Streaming.
- Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip , XML and JSON .
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
- Monitoring Map Reduce Jobs and YARN Applications.
- Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.
- Work experience with cloud infrastructure such as Amazon Web Services (AWS) EC2 and S3.
- Used Git for source code and version control management.
- Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures.
- Proficient in Java, J2EE, JDBC, Collection Framework, JSON, XML, REST, SOAP Web services. Strong understanding in Agile and Waterfall SDLC methodologies.
- Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer.
- Have excellent problem solving, proactive thinking, analytical, programming and communication skills.
- Experience working both independently and collaboratively to solve problems and deliver high-quality results in a fast-paced, unstructured environment.
TECHNICAL SKILLS:
Big Data Frameworks: Hadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming, Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra.
Bigdata distribution: Cloudera, Hortonworks, Amazon EMR
Programming languages: Core Java, Scala, Python, Shell scripting
Operating Systems: Windows, Linux (Ubuntu, Cent OS)
Databases: Oracle, SQL Server, MySQL
Designing Tools: UML, Visio
IDEs: Eclipse, NetBeans
Java Technologies: JSP, JDBC, Servlets, Junit
Web Technologies: XML, HTML, JavaScript, jQuery, JSON
Linux Experience: System Administration Tools, Puppet
Development methodologies: Agile, Waterfall
Logging Tools: Log4j
Application / Web Servers: Apache Tomcat, WebSphere
Messaging Services: ActiveMQ, Kafka, JMS
Version Tools: Git and CVS
Others: Putty, WinSCP, Data Lake, Talend, AWS
PROFESSIONAL EXPERIENCE:
Confidential, Deerfield, IL
Sr. Hadoop Developer
Responsibilities:
- Experience with complete SDLC process staging code reviews, source code management and build process.
- Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
- Experienced in Spark Core, Spark SQL, Spark Streaming.
- Performed transformations on the data using different Spark modules.
- Developed data pipelines using Flume, Sqoop, Pig and Map Reduce to ingest data into HDFS for analysis.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming.
- Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Wrote MapReduce jobs to discover trends in data usage by the users.
- Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe's.
- Load and transform large sets of structured, semi structured and unstructured data Pig.
- Experienced working on Pig to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing Hive UDF’s for the needed functionality that is not available out of the box from Hive.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Used HCATALOG to access the Hive table metadata from MapReduce and Pig scripts.
- Experience in writing and tuning Impala queries, creating views for ad-hoc and business processing.
- Used Zookeeper operational services for coordinating cluster and scheduling workflows.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Developed and executed hive queries for de normalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
- Experience loading and transforming structured and unstructured data into HBase and exposure handling Automatic failover in HBase.
- Ran POC's in Spark to take the benchmarking of the implementation.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s.
- Automated the end to end processing using Oozie workflows and coordinators.
- Involved in developing test framework for data profiling and validation using interactive queries and collected all the test results into audit tables for comparing the results over the period.
- Working on advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala and Python as per requirements.
- Extensively used GitHub as a code repository and Phabricator for managing day to day development process and to keep track of the issues.
Environment: Cloudera, Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Zookeeper, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator, Amazon Web Services
Confidential, San Diego, CA
Sr. Big Data Developer
Responsibilities:
- Worked on a live 90 nodes Hadoop cluster running CDH4.4
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB)
- Extracted the data from Teradata into HDFS using Sqoop.
- Worked with Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig (version 0.10) scripts to transform raw data from several data sources into forming baseline data.
- Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
- Created data lake on amazon s3
- Implemented scheduled downtime for non-prod servers for optimizing AWS pricing.
- Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
- Experience in using Sequence files, RC File, AVRO and HAR file formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4 Environment Cluster
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
- Worked on Performance Tuning to Ensure that assigned systems were patched, configured and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availability.
- Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
- Experience with professional software engineering practices for the full software development life cycle including coding standards, source control management control and build processes.
- Implemented best income logic using Pig scripts and UDFS.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experience in reviewing Hadoop log files to detect failures.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Imported data from MySQL server and other relational databases to Apache Hadoop with the help of Apache Sqoop.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Evaluated business requirements and prepared detailed specification’s that follow project guidelines required to develop written programs.
Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, SOLR.
Confidential, Tampa, FL
Sr. Hadoop Developer
Responsibilities:
- Worked with highly unstructured and semi structured data of 120 TB in size (360 TB)
- Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Involved in the setup and deployment of Hadoop cluster.
- Developed Map Reduce programs for some refined queries on big data.
- Involved in loading data from UNIX file system to HDFS.
- Implemented AWS solutions using EC2 S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups,
- Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline
- Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
- Provided daily code contribution, worked in a test-driven development.
- Installed, Configured Talend ETL on single and multi-server environments.
- Developed Merge jobs in Python to extract and load data into MySQL database.
- Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements. Developed UNIX shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
- Developed Simple to complex Map/reduce Jobs using Hive.
- Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Involved in setting up of HBase to use HDFS.
- Extensively used Pig for data cleansing.
- Loaded streaming log data from various webservers into HDFS using Flume .
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based
- Performed benchmarking of the No-SQL databases, Cassandra and HBase streams.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
- Supported in setting up QA environment and updating configurations for implementing scripts With Pig and Sqoop.
- Involved in collecting and aggregating enormous amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Unix Shell Scripting, Python, Oracle 11g, DB2, HDFS, Kafka, Storm, Spark, ETL, 1Java (jdk1.7), Pig, Linux, Cassandra, MapReduce, MS Access, Toad, SQL, SCALA, MySQL Workbench, XML, No-SQL, MapReduce, SOLR, HBase, Hive, Sqoop, Flume, Talend, Oozie
Confidential
Hadoop Developer
Responsibilities:
- Worked with business teams and created Hive queries for ad hoc access.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Involved in review of functional and non-functional requirements
- Responsible to manage data coming from various sources.
- Loaded daily data from websites to Hadoop cluster by using Flume.
- Involved in loading data from UNIX file system to HDFS.
- Creating Hive tables and working on them using Hive QL.
- Created complex Hive tables and executed complex Hive queries on Hive warehouse.
- Wrote MapReduce code to convert unstructured data to semi structured data.
- Used Pig to extract, transformation & load of semi structured data.
- Installed and configured Hive and written Hive UDFs.
- Develop Hive queries for the analysts.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Design technical solution for real-time analytics using Kafka and HBase.
- Cluster co-ordination services through Zookeeper.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Creating Hive tables and working on them using Hive QL.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Participated in development and execution of system and disaster recovery processes and actively collaborated in all Security Hardening processes on the Cluster.
- Support the data analysts and developers of BI and for Hive/Pig development.
Environment: Apache Hadoop, HDFS, Cassandra, MapReduce, HBase, Impala, Java (jdk1.6), Kafka, MySQL, Amazon, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig, Infosphere Python, Scala, NoSQL, Flume, Oozie
Confidential
Java/J2EE Developer
Responsibilities:
- Analyze and modify Java/J2EE Application using JDK 1.7/1.8 and develop webpages using Spring MVC Framework.
- Coordinate with the business analyst and application architects to maintain knowledge on all functional requirements and ensure compliance to all architecture standards.
- Follow AGILE methodology with TDD through all the phases of SDLC.
- Used Connection Pooling to get JDBC connection and access database procedures.
- Attending the daily Standup Meetings.
- Use Rally for managing the portfolio, creating and keep tracking of the user stories.
- Responsible for analysis, design, development and integration of UI components with backend using J2EE technologies.
- Used JUnit to validate input for functions TDD.
- Developed User Interface pages using HTML5, CSS3 and JavaScript.
- Involved in development activities using Core Java /J2EE, Servlets, JSP, JSF used for creating web application, XML and Springs.
- Used Maven tool for building the application and run it using Tomcat Server.
- Use GIT as version control for tracking the changes in the project.
- Used Junit Framework for unit testing and Selenium for integration testing and Test Automation.
- Assist in development for various applications and maintain quality for same and perform troubleshoot to resolve all application issues/bugs identified during the test cycles.
Environment: Java/J2EE, JDK 1.7/1.8, LINUX, Spring MVC, Eclipse, JUnit, Servlets, DB2, Oracle 11g/12c, GIT, GitHub, JSON, RESTful, HTML5, CSS3, JavaScript, Rally, Agile/Scrum