We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Over 8+ years of experience in Big Data Analytics, Hadoop, Java, Database Administration and Software development expertise.
  • Strong hands on experience in Hadoop Framework and its ecosystem including HDFSArchitecture, MapReduce Programming, Hive, Pig, Sqoop, Hbase, Zookeeper, Couchbase, Storm, Solr, Oozie,Spark, Scala, Flume, Strom and Kafka.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and scala.
  • Experience in strong and analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in importing and exporting data into HDFS and Hive usingSqoop.
  • Integrated different data sources, data wrangling: cleaning, transforming, merging and reshaping data sets by writing Python scripts.
  • Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
  • Hands on experience in installing, configuring Cloudera's Apache Hadoop ecosystem components like
  • Flume - ng, Hbase, Zoo Keeper, Oozie, Hive, Spark, Storm, Sqoop, Kafka, Hue, Pig, Hue with CDH3&4 Clusters
  • Architected, Designed and maintained high performing ELT/ETL Processes.
  • Skilled in managing and reviewing Hadoop log files.
  • Experienced in loading data to Hive partitions and creating buckets in Hive
  • Experienced in configuring Flume to stream data into HDFS.
  • Experienced in real-time Big Data solutions usingHbase, handling billions of records.
  • Processing this data using Spark Streaming API with Scala.
  • Familiarity with distributed coordination system Zookeeper.
  • Involved in designing and deploying a multitude applications utilizing the entire AWS stack (Including EC2, RDS, VPC, IAM) focusing on high-availability, fault tolerance and auto-scaling.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Good knowledge on building Apache spark applications using Scala.
  • Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Teradata.
  • Potential experience in (SDLC) Analysis, Design, Development, Integration and Testing in diversified areas of Client-Server/Enterprise applications using Java, J2EE technologies.
  • Done Administration, installing, upgrading and managing distributions of Cassandra.
  • Strong database development skills using Database servers like Oracle, IBM DB2, My SQL and hands on experience with SQL, PL/SQL. Extensive experience of backend database programming in oracle environment using PL/SQL with tools such as TOAD.
  • Have a very good understanding and worked with relational databases like MySQL, Oracle and NoSQL databases like Hbase, Mongo DB, Couchbase and Cassandra.
  • Good work experience on JAVA, JDBC, Servlets, JSP.
  • Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON,XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link.
  • Good knowledge in performance troubleshooting and tunning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
  • Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Skilled in developing applications in Python language for multiple platforms familiarity with process and Python software development

TECHNICAL SKILLS

Big Data Eco System: Hadoop 2.1, HDFS, MapReduce, PIG 0.8, Hive0.13, Hbase 0.94, Sqoop 1.4.4, Zookeeper 3.4.5, Storm, Yarn, Spark Streaming,Spark SQL, Kafka,Scala, Cloudera CDH3, CDH4, Hortonworks, Oozie, Flume, Impala, Talend, Tableau/Qlickview

Hadoop management & Security: Hortonworks Ambari, Cloudera Manager, Kafka

NoSQL Databases: MongoDB, Hbase, Redis, Couchbase and Cassandra

Web Technologies: DHTML, HTML, XHTML, XML, XSL (XSLT, XPATH), XSD, CSS, JavaScript, Servlets, SOAP, Amazon AWS

Server-Side Scripting: UNIX Shell Scripting

Database: Oracle 11g/10g/9i/8i, MS SQL Server 2012/2008, DB2 v8.1, MySQL, Teradata

Programming Languages: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1

Scripting Languages: Python, Perl, Shell Scripting, JavaScript, Scala

OS/Platforms: Windows7/2008/Vista/2003/XP/2000/NT,Macintosh, Linux(All major distributions, mainly Centos and Ubuntu), Unix

Client side: JavaScript, CSS, HTML, JQuery

Build tools: Maven and ANT

Methodologies: Agile, UML, Design Patterns, SDLC

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer

Office Tools: MS Office - Excel, Word, PowerPoint

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential, Atlanta, GA

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
  • Involved in loading the created HFiles into Hbase for faster access of large customer base without taking Performance hit.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Implemented discretization and binning, data wrangling: cleaning, transforming, merging and reshaping data frames using Python.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Involved in managing and reviewing Hadoop log files.
  • Responsible to manage data coming from different sources.
  • Involved in creating Pig tables, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.
  • Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
  • Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance.
  • Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
  • Used AvroSerdes to handle Avro Format Data in Hive and Impala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
  • Handled Administration, installing, upgrading and managing distributions of Cassandra.
  • Assisted in performing unit testing of Map Reduce jobs using MRUnit.
  • Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
  • Experience in integrating Apache Kafkawith Apache Storm and created Storm data pipelines for real time processing.
  • Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
  • Worked with Talendon a POC for integration of data from the data lake.
  • Highly involved in development/implementation of Cassandra environment.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Python, Hive,Spark, Hue, Pig, Sqoop, Kafka, AWS, Avro, HBase, Oozie, Cassandra, Impala, Zookeeper, Talend, Teradata, Oracle 11g/10g, Python,Java (jdk1.6), Scala, UNIX, SVN,Hortonworks, Maven.

Hadoop Developer

Confidential, IL

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed data pipeline using Flume, Sqoop, Pigand MapReduce to ingest behavioral data into HDFS for analysis.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
  • Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
  • Worked on python files to load the data from csv, json, mysql, hive files to Neo4j Graphical database.
  • Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
  • Created Hive Generic UDF's, UDAF's, UDTF's in java to process business logic that varies based on policy.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Experienced on Loading streaming data into HDFS using Kafka messaging system.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Proficient in querying Hbase using Impala.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Design technical solution for real-time analytics using Kafka and Hbase.
  • Created UDF's to store specialized data structures in HBase and Cassandra.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
  • Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
  • Imported structured data, tables into Hbase.
  • Involved in Backup, HA, and DR planning of applications in AWS.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Used AWS Patch Manager to select and deploy operating system and software patches across EC2 instances.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
  • Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Set-up configured and optimized the Cassandra cluster. Developed real-time java-based application to work along with the Cassandra database.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Converting queries to Spark SQL and using parquet file as storage format.
  • Developed analytical component using Scala, Spark and SparkStream.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Written spark programs in Scala and ran spark jobs on YARN.
  • Designed and Implemented SolrSearch using the big data pipeline.
  • Assembled Hive and Hbase with Solr to build a full pipeline for data analysis.
  • Written Storm topology to emit data into Cassandra DB.
  • Experienced in sync up Solr with HBase to compute indexed views for data exploration.
  • Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
  • Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
  • Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
  • Setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Knowledgable on Talendfor Data integration purpose.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Experienced in Monitoring Cluster using Cloudera manager.

Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Python, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AWS, Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.

Hadoop Developer

Confidential, Dublin, OH

Responsibilities:

  • Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Creating multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server using Sqoop.
  • Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Creating Hive tables, loading with data and writing Hive queries.
  • Involved in Spark for fast processing of data. Defining job flows.
  • Using Hive to analyze the partitioned data and compute various metrics for reporting.
  • Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
  • Managing and reviewing theHadoop log files.
  • Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
  • Unit testing and delivered Unit test plans and results documents.
  • Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Kafka, AWS, Oozie, Zookeeper, Java, Spark, Scala.

Confidential, Austin, TX

Senior Hadoop Developer

Responsibilities:

  • Hadoop development and implementation (Environment - HDFS, Hbase, Spark, Kafka, Ozie, Scoop, Flume, Kerberos, Oracle ASO, MySQL) Loading from disparate data sets using Hadoop stack of ingestion and workflow tools Pre-processing using Hive and Pig. Designing, building, installing, configuring and supporting Hadoop. Translate complex functional and technical requirements into detailed design. Perform analysis of vast data stores and uncover insights. Maintain security and data privacy. Managing and deploying HBase. Being a part of a POC effort to help build new Hadoop clusters. Test prototypes and oversee handover to operational teams. Propose best practices/standards. Configure and implementation of Data Marts in Hadoop platform in loading data from Teradata, Oracle database into HDFS using Sqoop queries.
  • Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Worked on shell scripting in Linux and the Cluster. Used shell scripts to run hive queries from beeline.
  • Developed Scripts and automated data management from end to end and sync up between all the clusters.
  • Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Loaded multiple NOSQL databases including MongoDB, PostgreSQL, Couchbase, HBase and Cassandra.
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
  • Setting up Snowflake connections through private link from AWS EC2 and AWS EMR to secure data transfers between application and database.
  • Used Zookeeper for providing coordinating services to the cluster.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, Kafka, Oozie, and Big Data, Python, Apache Java (jdk1.6), Data tax, Flat files, MySQL, Toad, Windows NT, LINUX, Cassandra UNIX, SVN,Hortonworks Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, Scala, MongoDB.

Hadoop Developer

Confidential, Wilmington, DE

Responsibilities:

  • Involved in Design and development of common frameworks, utilities across work streams.
  • Coordinated with business users to understand business requirements as part of development activities.
  • Implemented common jdbc utility for data sourcing in spark.
  • Improved the performance of spark jobs by configuring job settings.
  • Optimized and tuned spark applications by using storage level mechanism persist, cache.
  • Used HBase tables for storing the Kafka offset values.
  • Data Enrichment process handled using Spark for all dimension and fact tables.
  • Final tables are exported to Essbase two dimensional database for business validations.
  • Handled spark return codes by adding custom method, when jobs running in cluster mode.
  • Used broadcast variables for input control files as part of enrichment process.
  • Used Splunk for log analysis in uat and prod environment, to overcome operate support.
  • Handled Spark precision loss issue, by using Scala Big Decimal datatype.
  • Imported data from different RDBMS systems such as Oracle, Teradata.
  • Implemented custom jar utility for Excel to CSV file conversion.
  • Used compression techniques such as Snappy, Gzip for data loads and archival.
  • Handled incremental stats and compute stats as daily and weekly jobs to overcome memory issues and long running queries in impala.
  • Involved in writing Jil scripts for Scheduling jobs using an automation tool Autosys.
  • Daily, Monthly, Quarterly and adhoc based data loads automated in Autosys and will run as per calendar dates scheduled.
  • Involved in Production Support, BAU Activities and Release management.
  • Expertise in writing custom UDFs in Hive.

Environment: Cloudera Hadoop, Spark, Scala, Hive, HBase, Kafka, Essbase, Shell, Sqoop, Xml Workflows, Splunk, Teradata, Oracle, Hue, Impala, SVN, Bitbucket.

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

  • Coordinated with business users to gather business requirements and interacted with technical leads for the Application design level.
  • Implemented all custom file upload process in pyspark.
  • Implemented common jdbc utility for data sourcing in spark.
  • Worked on optimizing and tuning spark applications by using persist, cache, broadcast.
  • Improved the performance of spark jobs by configuring job settings.
  • Involved in edgenode migration for enterprise level cluster and re-built the application as per the new standards in architecture level.
  • Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work.
  • Imported and exported data from different RDBMS systems such as Oracle, Teradata, SqlServer, Netezza and Linux systems such as Sas Grid.
  • Handled semi-structured data such as excel, csv and imported from sas grid to hdfs by using sftp process.
  • Ingested data into hive tables, using sqoop and sftp process.
  • Used compression techniques such as Snappy, Gzip for data loads and archival.
  • Created data pipelines and implemented all kinds of data transformations using hadoop and spark.
  • Data level transformations have been done in intermediate tables before forming final tables.
  • Data Integrity checks have been handled using hive queries, Hadoop and Spark.
  • All reporting tables are exposed in Tableau, by using Impala Server for better performance.
  • Installed and implemented Kerberos security authentication for applications Keytabs.
  • Involved in writing jil scripts for Scheduling jobs using an automation tool Autosys.
  • Daily, Monthly, Quarterly and adhoc based data loads automated in Austosys and will run as per calendar dates scheduled.
  • Involved in Production Support, BAU Actvities and Release management.
  • Expertise in writing custom UDFs in Hive.
  • Environment: Cloudera Hadoop, Pyspark, Hive, Pig, Shell, Sqoop, Oozie Workflows, Teradata, Netezza, Sql Server, Oracle, Hue, Impala, Jenkins, Kerberos.

Hadoop Developer

Confidential, Denver CO

Responsibilities:

  • Planned, installed and configured the distributed Hadoop Clusters.
  • Ingested data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
  • Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
  • Ingested data into Hive tables from MySQL, Pig and Hive using Sqoop.
  • Designed the flow and configured the individual components using Flume. Transferred bulk data from and to traditional databases with Sqoop.
  • Migrated the SQL stored procedures into Hadoop transformations
  • Wrote Batch operation across multiple rows for DDL (Data Definition Language) and DML (Data
  • Manipulation Language) for improvised performance using the client API calls
  • Grouped and filtered data using hive queries, HQL and Pig Latin Scripts.
  • Queried both Managed and External tables created by Hive using Impala.
  • Implemented partitioning and bucketing in Hive for more efficient querying of data.
  • Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work.
  • Maintained distributed Storage HDFS and Columnar Storage HBase.
  • Analyzed data using Pig scripting, Hive Queries, and Impala.
  • Maintained and coordinated service Zookeeper apart from designing and monitoring Oozie workflows
  • Designed and created both Managed/ External tables depending on the requirement for Hive.
  • Wrote custom UDFs in Hive.

Environment: Cloudera distribution CDH4, Hadoop, Map Reduce, MySQL, Linux, Hive, Pig, Impala, Sqoop, Zookeeper.

We'd love your feedback!