We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Birmingham, AL

PROFESSIONAL SUMMARY:

  • 9+ Years of professional experience in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Eco - components, Spark streaming and Amazon Web services(AWS).
  • Having Good Working Expertise on handling Terabytes of structured and unstructured data on significantly big Cluster Environment.
  • Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
  • Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
  • Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
  • Expertise in data transformation & analysis using SPARK, PIG, HIVE
  • Used Apache Solr to Query Indexed Data for Analytics for boosting search catalyst operations.
  • Implemented MLib functions for training and building fully loaded classifiers models using Spark streaming, Spark SQL and Machine Learning APIs .
  • Worked on real time data integration using Kafka - Storm data pipeline, Spark streaming and HBase .
  • Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie .
  • Hands on experience with AWS components like VPC, EC2, EBS,RedShift,CFT .
  • Worked on all major components of Hadoop Eco-components such as HDFS, HIVE, PIG, Oozie, Sqoop, Map Reduce and YARN on Cloudera, MapR and Hortonworks distributions .
  • Worked on setting up AWS EMR, EC2 clusters and Multi-Node Hadoop Cluster inside developer environment.
  • Developed scripts and batch jobs to monitor and schedule various Spark jobs.
  • Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie.
  • Worked on importing and exporting data from different databases like Oracle, Teradata,MySQL, and IBMDB2 into HDFS and Hive using Sqoop.
  • Worked on collecting stream data into HDFS using Kafka, Flume and Flink .
  • Written Spark SQL, Cassandra QL, Hive QL and PIG Latin queries for data analysis to meet the business requirements.
  • Created baseand incremented tables with partitioning, bucketing of table and creating UDF’s in Hive using optimisation and tuning.
  • Worked on NoSQL databases such as HBase, MongoDBand Cassandra .
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG MR Frameworks.
  • Implemented automatic workflows and job scheduling using Oozie, Zookeeper and Ambari
  • Good working experience on Hadoop Cluster architecture and monitoring the cluster. In-depth understanding of Data Structure and Algorithms.
  • Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases to Hadoop
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, Avro Parquet files.
  • Worked on Software Development Life Cycle (SDLC)
  • Experienced in creating PL/SQL Stored Procedures, Functions, Cursors against Oracle (10g, 11g)
  • Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.

TECHNICAL SKILLS:

BIGDATA Technologies: Apache Spark, ApacheSolr, HDFS, MapReduce, Pig, Sqoop, Hive, HBase, Kafka, Flume, OOZIE, YARN

AWS Components: VPC, EC2, EBS, RedShift, CFT, EMR, Cloudwatch

Programming Languages: Scala, Python, R and Java

Java Technologies: STRUTS, J2SE, JDBC, Java Script, HTML, XML

Servers: Web Logic and Tomcat

Java IDEs: Lucidworks Fusion, NetBeans, Eclipse.

Databases: Oracle10g, PL/SQL,MYSQL,HBase, MongoDB, NOSQL, Cassandra

Operating Systems: Windows7, Windows XP, Unix and Linux, CentOS, Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Birmingham, AL

Big Data/Hadoop Developer

Responsibilities:

  • Involved in complete project life cycle starting from design discussion to production deployment.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed a job server (REST API, Spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.
  • Implemented solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Pig, Sqoop, Hbase, Map reduce, etc.
  • Design and develop a daily process to do incremental import of raw data from DB2 into Hive tables using Sqoop.
  • Involved in debugging Map Reduce job using MR Unit framework and optimizing Map Reduce.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into Hive tables.
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis.
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Effectively used Sqoop to transfer data from databases (SQL, Oracle) to HDFS, Hive.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Designed Hive external tables using shared meta-store instead of derby with dynamic partitioning &buckets.
  • Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created concurrent access for hive tables with shared/exclusive locks enabled by implementing Zookeeper in cluster.
  • Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
  • Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming the data using with KAFKA.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically. worked on creating End-End data pipeline orchestration using Oozie.
  • Populated HDFS and Cassandra with massive amounts of data using Apache Kafka.
  • Involved in design and developed Kafka and Storm based data with the infrastructure team.
  • Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Scala, Sqoop and Flume.
  • Developed Hive Scripts, Pig scripts, Unix Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
  • Worked with Oozie and Zookeeper to manage job workflow and job coordination in the cluster.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Pig, Sqoop, Oozie, Map Reduce, SQL

Confidential, Santa Clara, CA

Bigdata/Hadoop Developer

Responsibilities:

  • Experience of gathering business requirements from customers, analyze them and point out the impacts.
  • Created sessions, tasks and workflows using Alteryx.
  • Used Alteryxto perform end to end ETL process which included developing mappings, extracting data from sources (Flat files & Oracle Database 11g), cleansing and transforming the data to specific business requirements/rules, and loading the data into target.
  • Worked on big data such as Streamline analytics for building predictive models inside Machine Learning using Scala, Python and R.
  • Created Technical Documentation for all the developed mappings.
  • Used Apache Kafka for handling real-time user data feeds.
  • Used Kafka Direct Stream API to connect Kafka with Spark Streaming.
  • Developed Spark Streaming application in Java that connects to Kafka and fetch data in real-time.
  • Used Kafka to load data streaming in to HDFS.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used Hue for UI based PIG script execution, Tidalscheduling and creating tables in Hive.
  • Worked with BSA team in generating the reports on Tableau.
  • Experience working with spark machine learning and SPSS.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Involving in Analysis, Design, Implementation and Bug Fixing Activities.Handling structured and unstructured data and applying ETL processes. Responsible for building scalable distributed data pipelines using Hadoop.
  • Worked on developing ETL processes to load data from oracle data source to HDFS Sqoop, perform structural
  • Worked on requirement gathering, analysis and translated them into technical design.
  • Used Jenkins scheduler to schedule the ETL workflows.
  • Extensively used UNIX for shell Scripting and pulling the Logs from the Server.
  • Conduct daily stand up with offshore team: updating them with applicable tasks & getting updates for the onshore team on a day-to-day basis.
  • Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level design documents from the requirements specification.

Environment: HDFS, Spark,ETL, Sqoop, Jenkins, Collibra, Oracle Database, Cloudera Distribution for Hadoop3(cdh3u5), Linux, Kafka

Confidential, San Jose, CA

Bigdata/Hadoop Developer

Responsibilities:

  • Worked on Streamline analytics and data consolidation projects on the product Lucidworks Fusion.
  • Integrated Kafka, Spark, Scala and HBasefor streamline analytics for creating a predictive model and implemented Machine Learning protocol.
  • Developed Scala scripts, UDFFs using both Data frames in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used Solr/Lucene for indexing and querying the JSON formatted data.
  • Handled cloud operations inside Rackspacefor persistence logic.
  • Monitored OOTB requests with ATG, Akamai and TIBCO.
  • Used REST services for handling unfinished jobs, knowing the status and creating a dataset inside a URL.
  • Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBaseand job scheduling through Oozie.
  • Designed and implemented streaming data on UI with Scala.js
  • Utilized DevOps principle components to ensure operational excellence before deploying in production.
  • Operating the cluster on AWS by using EC2, Akka, EMR, S3 and cloudwatch.
  • Transported data to HBase using Flume.
  • Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
  • Used Java APIs such as machine learning library functions, graph algorithms for training and predicting the linear model in spark streaming.
  • Have implemented unit testing in Java for pig and hive applications.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
  • Worked on loading the RDBMS data onto Hadoop/HDFS, using sqoop.
  • Developing the Pig scripts / UDFs to manipulate/transform the loaded data.
  • Creating Hive tables, loading data into Hive using UDFs, partioning and bucketing principles.
  • Migration Hive data into Hbase.
  • Involved in integrations among Pig, Hive and Hbase.
  • Instrumental in debugging. Created Hive External tables to store the processed data from Map Reduce

Environment: Hadoop0.20, HDFS, Hive,Pig, Sqoop, Java, Cloudera Distribution for Hadoop3(cdh3u5), Linux.

Confidential, Northern, NJ

Spark/Hadoop Developer

Responsibilities:

  • Developed multiple Spark jobs in PySpark for data cleaning and Pre-processing.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables and loading and analyzing data using Hive queries.
  • Developed simple/complex MapReduce jobs using Hive and Pig.
  • Migrated existing MapReduce programs to Spark using Scala and Python
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
  • Used APIs such as machine learning functions, graph algorithms for training and predicting the linear model in spark streaming.
  • Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
  • Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
  • Written the Apache PIG scripts to process the HDFS data.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the Sqoop scripts in order to make the interaction between Pig and SQL Database.
  • Writing the script files for processing data and loading to HDFS
  • Writing CLI commands using HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Completely involved in the requirement analysis phase.
  • Moved all log/text files generated by various products into HDFS location
  • Created External Hive Table on top of parsed data

Environment: CDH4, MapReduce, HDFS, Hive, Pig, Sqoop, Linux, XML, PL/SQL, SQL connector

Confidential, Grand Blanc, MI

Hadoop Developer

Responsibilities:

  • Importing and exporting data in HDFS and Hive using Sqoop
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed
  • Written Hive UDFs to extract data from staging tables
  • Analyzed the web log data using the HiveQL and process through Flume
  • Responsible for loading the customer's data and event logs from Oracle database, MySQL,Teradata into HDFS using Sqoop
  • Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
  • Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
  • Installed and configured Hive, Pig, Sqoop, Flume andOozie on the Hadoop cluster.
  • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion
  • Experienced on loading and transforming of large sets of structured and semi structured data.
  • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Created data-models for customer data using the Cassandra Query Language.
  • Ran many performance tests using the Cassandra-stress tool to measure and improve the read andQueried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.

Environment: CDH4, MapReduce, HDFS, Hive, Pig, Sqoop, Linux, XML, PL/SQL, SQL connector

Confidential, Atlanta, GA

HadoopDeveloper

Responsibilities:

  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS
  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
  • Participate in requirements gathering and designing, development, testing and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
  • Participated in client calls to gather and analyze the requirement.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Involved in Writing the Hive queries and Pig script files for processing data and loading to HDFS internally run in map reduce way.
  • Wrote Hive queries and Pig script for data analysis to meet the Business requirements.
  • Highly knowledge on Hadoop Administrator has extensive abilities of building, configuring and administration of large data clusters in big data environments using Apache distribution.
  • Having knowledge on Hive and Hbase integration, pig and Hbase Integration.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra

Environment: Hadoop, MapReduce, HDFS, Hive, Flume, Sqoop, Cloudera, Oozie, Data Stage, Java Cron Jobs, UNIX Scripts, Spark

Confidential

Java/J2EEDeveloper

Responsibilities:

  • Developed Servlets and Java Server Pages(JSP)
  • Writing Pseudo-code for Stored Procedures.
  • Developed PL SQL queries to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Used Java Script validation in JSP pages.
  • Helped design the database tables for optimal storage of data.
  • Coded JDBC calls in the Servlets to access the Oracle database tables.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases ofproject.

Confidential

Java/J2EEDeveloper

Responsibilities:

  • Involved in Design and Development of the System using Rational Rose and UML.
  • Involved in Business Analysis and developed Use Cases, Program Specifications to capture the business functionality.
  • Improved the coding standards, code reuse, and performance of the Extend application by making effective use of various design patterns (Business Delegate, View Helper, DAO, Value Object etc. and other Basic patterns)
  • Involved in designing and developing dynamic web pages using JSF.
  • Design of system using JSPs, Servlets.
  • Designed application using Process Object, DAO, Data Object, Value Object, Factory, Delegation patterns.
  • Involved in the design and development of Presentation Tier using JSP, HTML and JavaScript.
  • Designed and developed Class diagram, Identifying Objects and its interaction to specify Sequence diagrams for the System using Rational Rose.
  • Client-side validations are done using Java script.

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX.

We'd love your feedback!