We provide IT Staff Augmentation Services!

Sr.big Data/ Hadoop Developer Resume

2.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • Over 6+years of experience in Information Technology which includes in BIG DATA and HADOOP Ecosystem In - depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume and Spark.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing. Excellent knowledge of data mapping, extract, transform and load from different data source. Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
  • Excellent understanding and knowledge of NOSQL databases like HBase and Cassandra.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3.
  • Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, Data Node, Secondary Name node and MapReduce concepts
  • Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Hands on experience in writing MapReduce programs, Pig & Hive scripts.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
  • Extending Hive and Pig core functionality by writing custom UDFs
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience working on Application servers like IBM Web Sphere, JBoss, BEA Web Logic and Apache Tomcat.
  • Extensively used Kafka to load the log data from multiple sources directly into HDFS. Knowledge on RabbitMQ. Loaded streaming log data from various webservers into HDFS using Flume.
  • Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Hands on experience in application development using Java, RDBMS and Linux shell scripting.
  • Worked extensively with CDH3, CDH4 and SOA.
  • Skilled in leadership, self-motivated and ability to work in a team effectively. Possess excellent communication and analytical skills along with a can-do attitude.
  • Strong work ethics with desire to succeed and make significant contributions to the organization. Experience in processing different file formats like XML, JSON and sequence file formats.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
  • Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
  • Experience with Numpy, Matplotlib, Pandas, Seaborn, Plotly and Cufflinks python libraries.
  • Worked on large datasets by using Pyspark, numpy and pandas.
  • Good Experience in Agile Engineering practices, Scrum methodologies, and Test Driven Development and Waterfall methodologies.
  • Hands-on Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology.
  • Exposure to Java development projects.
  • Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2 and MySQL.
  • Good working experience on different OS like UNIX/Linux, Apple Mac OS-X Windows.
  • Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Flume, Oozie, Hadoop distribution and HBase, Spark, Redshift Spectrum

Hadoop ECO Systems: Spark-core, Kafka, Spark- SQL, HDFS, YARN, Sqoop, PIG, Hive, Oozie, Flume, Map Reduce, Storm

Development And Building Tools: Eclipse, Net Beans, IntelliJ, ANT, Maven, IVY, TOAD, SQL Developer

Data Bases: HBase, Cassandra, Oracle 9i/10g/11g, SQL Server 2008 R2/2012, My SQL,ODI, SQL/PL-SQL, MS-SQL Server 2005

Languages: Languages Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, PL/SQL,Scala,Python

Operating Systems: Windows Server 2000/2003/2008 , Windows XP/Vista, Mac OS, UNIX, LINUX

Java Technologies: Spring 3.0, Struts 2.2.1, Hibernate 3.0, Spring-WS, Apache Kafka

Frameworks : JUnit and Jest

IDE’s & Utilities: Eclipse, Maven, NetBeans.

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, QueryAnalyser, Profiler, Export & Import (DTS).

WebDev. Technologies: ASP.NET, HTML,HTML5, XML,CSS3, JavaScript/JQuery

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Sr.Big Data/ Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web servicesusing Kafka Producers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential
  • Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
  • Wrote various data normalization jobs for new data ingested into Redshift.
  • Advanced knowledge on Confidential Redshift and MPP database concepts.
  • Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
  • Written HBASE Client program in Java and web services.
  • Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in MapReduce.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce code
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG and Hive jobs using Kettle and Oozie (Work Flow management)
  • Worked on No-SQL databases like Cassandra, Mongo DB for POC purpose in storing images and URIs.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Used Talend ETL tool to develop multiple jobs and in setting workflows.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, spring planning's.
  • Handling All Azure Management Tools on Daily basis
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and

Environment: Hadoop, Confluent Kafka,, Apache Cassandra,Horton works HDF, HDP, NIFI, Apache Hadoop, Linux, Splunk, SOA, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, Data Mesh,Spark, Spark Streaming, Apache Kafka, Hive,Tez, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Pig, Sqoop, Hue, Oozie, Java, Scala, Python, GIT

Confidential, NJ

Sr. Hadoop / Big Data Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on CentOS
  • Implemented Apache Crunch library on top of map reduce and spark for data aggregation.
  • Involved in loading data from LINUX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, name node recovery, capacity planning, and slots configuration.
  • Implemented a script to transmit suspiring information from Oracle toHBase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Applied design patterns and OO design concepts to improve the existing Java/J2EE based code base.
  • Developed JAX-WS web services
  • Written HBASE Client program in Java and web services.
  • Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
  • Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
  • Developed MapReduce and Spark jobs to discover trends in data usage by users.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Implemented algorithms for real time analysis in Spark
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Real time streaming the data using Spark with Kafka and SOA
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Pig Latin scripts to perform Map Reduce jobs.
  • Developed product profiles using Pig and commodity UDFs.
  • Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Created UDF's to store specialized data structures in HBase and Cassandra.
  • Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Used Tez framework for building high performance jobs in Pig and Hive.
  • Configured Kafka to read and write messages from external programs.
  • Configured Kafka to handle real time data.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
  • Written Storm topology to emit data into Cassandra DB.
  • Written Storm topology to accept data from Kafka producer and process the data
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used JUnit framework to perform Unit testing of the application
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
  • Experience with data wrangling and creating workable datasets.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, CDH3, CentOS, UNIX, T-SQL MapReduce, Spark, Pig, Hive, Oozie, Zookeeper, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, REST, MySQL, Jasper soft, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

Confidential, IL

Hadoop Developer

Responsibilities:

  • Worked with engineering to strategize and develop data flow solutions using Hadoop, Hive and Java in order to address long-term technical and business needs.
  • Involved in all phases of SDL from gathering requirements from business users, designing the structure of the application, UML design, testing and deployment and maintenance of the application.
  • Involved in building the Hadoop cluster environment and its ecosystem components.
  • Configured the properties of the cluster according to the application requirements.
  • Developed map reduce programs to perform multiple adhoc requests .
  • Developed map reduce programs to pre-process the data, structuring the unstructured data and filtering the raw data.
  • Developed map reduce programs in Java to get the metrics of the application.
  • Performed the unit testing on the MapReduce jobs by writing MR Unit test cases.
  • Created partitioned, external and managed Hive tables to store the structured data produced from map reduce programs using pre-defined and custom defined SerDe.
  • Imported the data from SQL to Hadoop/HDFS using SQOOP .
  • Developed Hive UDF in Java to meet application requirements.
  • Developed Hive queries to perform analytics on the data stored in Hive tables.
  • Improved the hive queries performance by partitioning and bucketing the index of the hive tables.
  • Experienced in writing the SQL queries and procedures in MySQL
  • Developed Pig Latin scripts for adhoc requests .
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Designed the HBase schema and creating the HBase tables to store the data.
  • Worked on bulk loading the data into HBase tables .
  • Redesigned the HBase tables to improve the performance according to the query requirements.
  • Scheduled Hadoop ecosystem jobs using Oozie
  • Developed UNIX scripts to trigger MapReduce jobs.
  • Visualized the analytical results using Tableau visualization tool.
  • Performed the regression testing on the application.
  • Deployed the application on Amazon Web Services (AWS) .
  • Debugging the production issues and worked on the root cause analysis of the issue and fix accordingly.
  • Worked on optimizing the cost reduction of production jobs by applying techniques like compression.
  • Developed different SOAP services and Web Services using WSDL, SOAP.
  • Designed workflows and developed applications implementing the business processes.
  • Developed AJAX scripting to process server side JSP scripting.
  • Incorporated model relationships and access controls for complex APIs using Loop Backframework
  • Involved in the GUI development for implementing new JSP pages .
  • Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
  • Involved in integration of GUI components of the code .
  • Assisted team lead with task management and Scrums .
  • Involved in unit testing, validating through User Acceptance Testing

Environment: Hadoop 2.6.0 YARN,MapR,Redhat Linux, Cent OS, Java 1.6, Hive 0.13, Pig, MySQL, Hbase Spark, Sqoop, Oozie,HDFS, MapReduce, Storm, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.

Confidential - Providence, RI

Hadoop Developer

Responsibilities:

  • suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin
  • Experience in writing Spark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Namenode high availability, capacity planning, and slots configuration.
  • Responsible for managing data coming from different sources.
  • Imported and exported data into HDFS using Flume.
  • Experienced in analyzing data with Hive and Pig.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
  • Experienced in managing and reviewing Hadoop log files.
  • Practical work experience with Hadoop Ecosystem (i.e. Hadoop, Hive, Pig, Sqoop etc.)
  • Experience with UNIX and/or Linux.
  • Conduct Trainings on Hadoop Map Reduce, Pig and Hive. Demonstrates up-to-date expertise in Hadoop and applies this to the development, execution, and improvement.
  • Ensures technology roadmaps are incorporated into data and database designs.
  • Experience in extracting large data sets is a HUGE plus.
  • Experience in data management and analysis technologies like Hadoop, HDFS.
  • Create list and summary view reports.
  • Handling and communicating with business and understanding the problems from business perspective rather than as a developer perspective.
  • Preparing the Unit Test Plan and System Test Plan documents.
  • Preparation & Execution of unit test cases and Troubleshooting and debugging.
  • Helped with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
  • Analyzed data with Hive, Pig and Hadoop Streaming.
  • Involved in transforming the relational database to legacy labels to HDFS and HBASE tables using Sqoop and vice versa.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop
  • Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
  • Used Flume to collect, aggregate and push log data from different log servers

Environment: Hadoop, Hortonworks, Linux, HDFS,Cloudera Hadoop, Linux, HDFS, Map reduce, Pig, Oracle, SQL Server, Eclipse, Java and Oozie scheduler, Hive, Sqoop, Flume, Zookeeper and HBase

We'd love your feedback!