We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Pleasanton, CA

PROFESSIONAL SUMMARY:

  • Around 8 years of IT experience in software development and support with experiences in developing strategic methods for deploying Big Data Technologies to efficiently solve Big Data processing requirements.
  • Good hands on experience in Hadoop eco system components HDFS/Map reduce, HBase, Yarn, Pig, Spark, Sqoop, Spark SQL, Spark Streaming and Hive.
  • Experience in Installing, maintaining and configuring Hadoop Cluster.
  • Efficient in processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Capable of creating and monitoring Hadoop clusters on Amazon EC2, Hortonworks Data Platform 2.1 & 2.2, VM, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc.
  • Experience in working with structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables.
  • Hands on experience in different stages of big data applications such as data ingestion, data analytics and data visualization.
  • Good experience on Scala Programming language and Spark Core.
  • Hands on experience in Importing and Exporting the Data using SQOOP from HDFS to Relational Database management system.
  • Experience in analyzing data with Hive.
  • Well Experience in using Hive Query Language and data Analytics using Hive Query Language.
  • Expertise in Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoop cluster.
  • Experience on setting cluster in Amazon EC2, Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, and RedShift.
  • Worked on Oozie and Zookeeper for managing Hadoop jobs.
  • Capable to analyze data, interpret results, and convey findings in a concise and professional manner.
  • Promote/ Simulate complete approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies.
  • Strong experience in RDBMS technologies like MySQL, Oracle and Teradata.
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Experience in developing Web-Services module for integration using SOAP and REST.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
  • Experience in working with structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables.
  • Used Maven Deployment Descriptor Setting up build environment by writing Maven build XML, taking build, configuring and deploying of the application in all the servers
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
  • Proficient knowledge on java virtual machines (JVM) and multithreaded processing.
  • Good understanding of SQL, ETL and Data Warehousing Technologies.
  • Experience in working with job scheduler like Autosys and Maestro.
  • Strong in databases like Sybase, DB2, Oracle, MS SQL, Clickstream.
  • Loaded the dataset into Hive for ETL operation.
  • Proficient in using various IDEs like RAD, Eclipse.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team.
  • Strong communication, collaboration & team building skills with proficiency at grasping new technical concepts quickly and utilizing them in a productive manner.
  • Experienced in provided training to team members as new per the project requirement.
  • Experienced in creating Product Documentation & Presentations.

TECHNICAL SKILLS:

Languages: Java 8 (JDK 1.4/1.5/1.6/1.7/1.8 ), Eclipse, Java Swings, JSF, JUnit, Log4J, Ant, Maven, Python.

Hadoop Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, Kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, CDH5, PySpark

Hadoop Distributions: Cloudera CDH3/4/5, Hortonworks, MapR 5.1/5.2

XML/Web Services: XML, XSD, WSDL, SOAP, Apache Axis, DOM, SAX, JAXP, JAXB, XML Beans, SOAP, RESTful Web Services

Frameworks: Struts, Spring, Hibernate, Spring MVC, Spring Web Flow, Spring IOC, Spring AOP, Groovy.

Application/Web Servers: WebLogic 8.x/9.x/10. x. JBOSS 3.x/4.0, IBM Web Sphere 4.0/5.x/6. x.

IDE Tools: Eclipse, Rational Application Developer (RAD) and NetBeans, STS.

Databases: Oracle 11g/12C, MySQL, SQL, MongoDB, Cassandra, HiveQL, Spark-SQL .

Reporting Tools: Crystal Reports, BO XI R3.

Cloud: AWS.

Version Control Tools: CVS, SVN, Clear Case, Git.

Testing: Selenium, Karma.

Messaging Tools: JMS

Operating System: Windows, Linux, Unix, Macintosh HD.

PROFESSIONAL EXPERIENCE:

Confidential, Pleasanton, CA

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Having experience working with DevOps.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
  • Having experience in doing structured modelling on unstructured data models.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked on Hortonworks Data Platform (HDP)
  • Worked with SPLUNK to analyze and visualize data.
  • Worked on Mesos cluster and Marathon.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked with Orchestration tools like Airflow.
  • Write test cases, analyze and reporting test results to product teams.
  • Good experience on Clojure, Kafka and Storm.
  • Worked with AWS data pipeline.
  • Worked with Elastic Search, Postgres, Apache NIFI.
  • Hadoop workflow management using Oozie, Azkaban, Hamake.
  • Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked in functional, system, and regression testing activities with agile methodology.
  • Worked on Python plugin on MySQL workbench to upload CSV files.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked with HDFS Storage Formats like Avro, Orc.
  • Worked with Accumulo to Modify server-side Key Value pairs.
  • Working experience with shiny and R.
  • Working experience with Vertica, QilkSense, QlikView and SAP BOE.
  • Worked with NoSQL databases like HBase, Cassandra, DynamoDB
  • Worked with AWS based data ingestion and transformations.
  • Good experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive and Phoenix.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files.
  • Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
  • Extract, Load, and Transfer data through Talend.
  • Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
  • Extensive experience in using the MOM with Active MQ, Apache storm, Apache Spark & Kafka Maven, and Zookeeper.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Worked on Descriptive Statistics Using R.
  • Developed Kafka producer and consumers, HBase clients, Spark, shark, Streams and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Strong Working experience in snowflake, Clickstream.
  • Worked on Hadoop EMC Greenplum, Gemstone, Gemfire.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Experience using Spark with Neo4J where acquiring the interrelated graphical information of the insurer and to query the data from the stored graphs.
  • Experience in writing batch processing huge Scala programs.
  • Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
  • Responsible for creating Hive External tables and loaded the data in to tables and query data using HQL.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, OLAP, data modelling, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark, Neo4J.

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Responsible for understanding the requirements and implementing the security using AD Groups for the Dataset.
  • Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
  • Worked on ETL scripts to pull the data from DB2/Oracle Data Base into HDFS.
  • Experience in utilizing Spark machine learning techniques implemented in Scala.
  • Involved in POC development and unit testing using Spark and Scala.
  • Created Partitioned Hive tables and worked on them using Hive.
  • Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Develop and implement Python/Django applications.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
  • Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
  • Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer)
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Skilled in using collections in Python for manipulating and looping through different user defined objects.
  • Wrote a Python module to connect and view the status of an Apache Cassandra instance.
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Configured Zookeeper for Cluster co-ordination services.
  • Generated Python Django forms to record data of online users and used PyTest for writing test cases
  • Developed a unit test script to read a Parquet file for testing PySpark on the cluster.
  • Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value. Worked on NOSQL databases which differ from classic relational databases.
  • Conducted requirements gathering sessions with various stakeholders
  • Involved in knowledge transition activities to the team members.
  • Successful in creating and implementing complex code changes.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch
  • Experience in S3, Cloud Front and Route 53.

Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Impala, Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python, Kafka, PySpark.

Confidential, Farmington Hills, MI

Hadoop Developer

Responsibilities:

  • Responsible for gathering requirements from the business partners.
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema as the source and generate the properties which are used by Oozie jobs.
  • Developed Oozie workflow's for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Involved in building database Model, APIs and Views utilizing python, to build an interactive web based solution
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hive scripts for performing transformation logic and loading the data from staging zone to final landing zone.
  • Developed monitoring and notification tools using Python.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
  • Developed and tested many features for dashboard using Python, Java, Bootstrap, CSS, JavaScript and jQuery.
  • Responsible to check-in the developed code into Harvest for release management which is a part of CI/CD.
  • Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS. Working with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services
  • The created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results in documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Understanding business needs, analyzing functional specifications and map those to develop.
  • Involved in loading data from Mainframe DB2 into HDFS using Sqoop.
  • Handled Delta processing or incremental updates using Hive.
  • Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system.
  • Developed PIG Latin scripts in transformations while extracting data from source system.
  • To work on data issue related tickets and to provide the fix.
  • To monitor and fix the production job failures.
  • Review the team members design documents and coding.
  • Documented the systems processes and procedures for future references including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Implemented data ingestion from multiple sources like IBM Mainframes, Oracle.
  • Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
  • Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
  • Increased performance of the HiveQL by splitting larger queries into small and by introducing temporary tables in between them
  • Extensively involved in performance tuning of the HiveQL by performing bucketing on large Hive tables
  • Used open source web scraping framework for python to crawl and extract data from web pages
  • Optimized the Hive queries by setting different combinations of Hive parameters
  • Developed UDFs (User Defined Functions) to extend core functionality of PIG and HIVE queries as per requirement
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data
  • Used Spark API over Hadoop to analyses data in Hive
  • Have a thorough knowledge of spark architecture and how RDD's work internally.
  • Have experience in Scala programming language and used it extensively with Spark for data processing.

Environment: HDFS, Hive, Pig, HBase, Unix Shell Script, Talend, Spark, Scala.

Confidential

Java Developer

Responsibilities:

  • Developed JSPs and Servlets using Struts framework.
  • Involved in the Requirements collection & Analysis from the business team.
  • Created the design documents with use case diagram, class diagrams, sequence diagrams using Rational Rose.
  • Implemented the MVC architecture using Apache Struts 1.2 Framework.
  • Mailing system using Core java mail API to notify the staff when a customer submitted a policy.
  • Implemented session beans to handle business logic for fund transfer, loan, credit card and fixed deposit modules.
  • Worked with various Java patterns such as Service Locator and Factory Pattern at the business layer for effective object behaviours.
  • Worked on the JAVA Collections API for handling the data objects between the business layers and the front end.
  • Developed Unit test cases using JUnit.
  • Implemented the background work using Multi-threading which sends mails in bulk behind the scene.
  • Implemented Tiles Framework for the views layout.
  • Used Clear Case for source code maintenance.
  • Developed the import/export utility, which allows user to import/export to/from MS-Excel, CSV and Database.
  • Implemented the web services client to consume the third-party service API for validating credit cards.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Developed ant scripts and developed builds using Apache ANT. Involved in integration system with BT's systems like GTC, CSS or through e Link hub and IBM MQ series.
  • Developed, Deployed and tested JSP's, Servlets in Web logic.
  • Used Eclipse as IDE tool and integrated Web Logic with Eclipse to deploy and develop the applications and JDBC to connect the database.

Environment: Struts frame work, core Java, J2EE (Java, JNDI, JSP, JDBC, Servlets and JSP), Apache Tomcat, SQL, HTML, Eclipse, Windows XP, MVC Design pattern, MySQL, Rational Rose, SOAP, Tiles.

We'd love your feedback!