We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Cleveland, OH

SUMMARY:

  • Adept and experienced Hadoop developer with over 7 years of experience in programming world and 5 years of proficiency in Hadoop ecosystem and Bigdata systems
  • In - depth experience and solid subjective knowledge of HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn/MRv2, Spark, Kafka, Impala, HBase and Oozie.
  • Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
  • Substantial experience writing MapReduce jobs in Java, PIG , Flume , Tez and Hive
  • Used Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • Has strong fundamental understanding of distributed computing and distributed storage concepts for highly scalable data engineering.
  • Worked with Pig and Hive and developed custom UDF’s for building various datasets.
  • Worked on MapReduce framework using Java programming language extensively.
  • Strong experience troubleshooting and performance fine-tuning spark, MapReduce and hive applications.
  • Worked with Click Stream Data extensively for creating various behavioral patterns of the visitors and allowing data science team to run various predictive models.
  • Worked on No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager and Ambari
  • Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Significant experience in working with cloud environment like AMAZON WEB SERVICES (AWS) EC2 and S3.
  • Strong expertise in Unix shell script programming.
  • Expertise in creating Shell-Scripts and Regular Expression.
  • Dexterous in visualizing data using Tableau, PowerBI and MS Excel.
  • Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema and Teradata.
  • Highly proficient in Scala programming Knowledge
  • Experience with web technologies which include HTML, CSS, Java Script, Ajax, JSON and frameworks like J2EE, Angular JS, spring.
  • Good Knowledge in REST Webservices, SOAP programming, WSDL, XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
  • Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills.
  • Good experience in Customer support role as, resolving production issues based on priority.

TECHNICAL SKILLS:

Hadoop/Bigdata Ecosystems: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search

Languages: C, C++, Java, Scala, Python, C#, SQL, PL/SQL

Frameworks: J2EE, Spring, Hibernate, Angular JS

Cluster Management and Monitoring: Coudera Manager, Hortonworks Ambari

Oracle 11g, MySQL, SQL: Server

Development Tools: Eclipse, NetBeans, Visual Studio, IntelliJ IDEA, XCode

Build Tools: ANT, Maven, sbt, Jenkins

Application Server: Tomcat 6.0, WebSphere7.0

Business Intelligence Tools: Tableau, Splunk, PowerBI

Version Control: GitHub, Bit Bucket, SVN

WORK EXPERIENCE:

Sr. Hadoop Developer

Confidential, Cleveland, OH

Responsibilities:

  • Gathered User requirements and designed technical and functional specifications.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
  • Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Hadoop Developer

Confidential, Eagan, MN

Responsibilities:

  • Worked on a live 24 node Hadoop cluster running on HDP 2.2.
  • Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
  • Created external and internal tables using HAWQ.
  • Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
  • Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
  • Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
  • Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
  • Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
  • Assisted with performance tuning, monitoring, and troubleshooting.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
  • Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Experienced in reviewing Hadoop log files to delete failures.
  • Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
  • Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analysing the Hadoop cluster as well as big data.
  • Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
  • Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
  • Experience in using Sequence files, RC file, AVRO and HAR file formats.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Used FLUME to dump the application server logs into HDFS.
  • Automating backups by shell for Linux to transfer data in S3 bucket.
  • Experience in UNIX Shell scripting.
  • Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
  • Automated incremental loads to load data into production cluster.

Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.

Hadoop Developer

Confidential, St Louis, MO

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Multithreading, synchronization, caching and memory management.
  • Used JAVA application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Built BIG data clusters using Apache Spark architecture for Analytics.
  • Developed PIG Latin scripts for the analysis of semi structured data. Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes.
  • Managed and reviewed log files.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, MongoDB, Flume, Spark, HTML, XML, SQL, MySQL, Core Java, Eclipse, Shell scripting, UNIX.

Big Data Engineer/Developer

Confidential

Responsibilities:

  • Developed several advanced Map Reduce programs to process data files received
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
  • Experience in implementing joins in the analysis of dataset to discover interesting relationships.
  • Completely involved in the requirement analysis phase.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
  • Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
  • Experience in writing cron jobs to run at regular intervals.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.

Java Developer

Confidential

Responsibilities:

  • Involved in Analysis, Design, Implementation and Bug Fixing Activities.
  • Designing the initial Web-WAP pages for a better UI as per the requirement.
  • Involved in Functional & Technical Specification documents review and the code review.
  • Undergone on the Domain Knowledge.
  • Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
  • Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Prepared the Support Guide containing the complete functionality.

Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

We'd love your feedback!