We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

0/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Around 7 Years of IT experience in Application Development domain of Java and Big Data.
  • Experience in Hadoop And it’s Eco System; HDFS, Map Reduce, Apache Pig, Hive, HBase, Oozie, Scala, Spark, Flume, Kafka, Storm And Sqoop.
  • Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Tez, Flume, Kafka, Storm, Spark, Scala, MongoDB, Couchbase and Cassandra.
  • Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and Scala.
  • Good knowledge on building Apache spark applications using Scala.
  • Having a Good exposure on Big Data technologies andHadoopecosystem, In - depth understanding of Map Reduce and theHadoopInfrastructure.
  • Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, mlib, Kafka, Flume, MapReduce, Hive etc.
  • Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
  • Proficient in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language). Have 2 years of profound experience in Cassandra database
  • Hadoop Administrator and developer: Set-up configured and monitored Hadoop cluster. Executed performance tuning of cluster.
  • Skilled in managing and reviewing Hadoop log files.
  • Expert in importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in loading data to Hive partitions and creating buckets in Hive
  • Experienced in configuring Flume to stream data into HDFS.
  • Experienced in real-time Big Data solutions using Hbase, handling billions of records.
  • Extensive experience in working with structured/semi-structured and unstructured data by implementing complex map reduce programs using design patterns.
  • Familiarity with Hadoop architecture and its components like HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node and Data Node.
  • Experienced in Application Development using Java, Hadoop, RDBMS and Linux shell scripting and performance tuning.
  • Expertise in writing ETL Jobs for analyzing data using Pig.
  • Experienced querying in Impala.
  • Experience in data warehousing with ETL tool Oracle Warehouse Builder (OWB).
  • Familiarity with distributed coordination system Zookeeper.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Experienced in working with Solr indexing and querying.
  • Excellent knowledge in Java and SQL in application development and deployment.
  • In-depth understanding of Data Structures and Algorithms and Optimization.
  • Have a very good understanding and worked with relational databases like MySQL, Oracle and NoSQL databases like Hbase, Mongo DB, Couchbase and Cassandra.
  • Well versed with databaseslike MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL.
  • Passionate towards working inHadoopand Big Data Technologies, data science, machine learning in Spark, Big Data Processing, Analytics and Visualization.
  • Versatile experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlets, Java beans and JDBC.
  • Expertise in developing the presentation layer components like HTML, CSS, JavaScript, JQuery, XML, JSON, AJAX and D3.
  • Experienced in source control repositories viz. SVN, GitHub.
  • Good Knowledge of analyzing data in HBase using Hive and Pig.
  • Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
  • Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
  • Designed Mapping documents for Big Data Application.
  • Experienced in Agile and SCRUM.

TECHNICAL SKILLS

Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Zookeeper, Oozie, Sqoop, Flume, Impala, Nifi, Solr, Tez and Ranger, Talend, Tableau/Qlickview.

No SQL: HBase, MongoDB, Couchbase, Neo4j, Cassandra

Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python, Scala

Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine

Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS

Operating system: Windows, Macintosh, Linux and Unix

Frameworks: Springs, MVC, Hibernate, Swings

DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits

Version Control: SVN, CVS and Rational Clear Case Remote Client, GitHub, Visual Studio

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer, WinScp, Tahiti, Cygwin

PROFESSIONAL EXPERIENCE

Confidential, Chicago IL

Sr. Hadoop Developer

Responsibilities:

  • Moved data using Sqoop into the Data Lake, Pig cleansing and Hive tables
  • Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
  • Moved files using Web hdfs url to find and reap the files from source (MapR) to destination (hortonworks cluster)
  • Optimization at all levels of development Raw, Refined and Enriched
  • Moved the data from MapR to Hortonworks during the migration.
  • Prepared xml files and automated the ingestion jobs using Oozie and Falcon.
  • Running and optimizing pig frameworks to cleanse the data.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Worked on supply chain project to ingest data from source systems and reap the data into Data Lake
  • Skilled in managing and reviewing Oozie and Falcon Log files and making necessary updates in viewing them on the dashboard visualization.
  • Working along with solution design teams to come up with the optimal solution on using appropriate set of tools for the specific sources of data
  • Developing use cases for processing real time streaming data using tools like Spark Streaming.
  • Worked with due diligence team to explore whether Nifi was a feasible option to our solution.
  • Experienced in processing Streaming data using injection tools like Kafka and Flume
  • Developing data retention mechanism and automation system to purge the archive data on shell and Oozie
  • Working on Agile Methodology and Waterfall model.

Environment: Hadoop, HDFS, Hive, MapReduce, Shell, Spark, TEZ, Pig, Sqoop, Flume, Kafka, Storm, Nifi, HBase, Oozie, Falcon, MapR, Hortonworks

Confidential, Houston TX

Sr. Hadoop Developer

Responsibilities:

  • Moved Relational Database data using Sqoop as ETL tool into the Data Lake, Hive Dynamic partition tables.
  • Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
  • Optimizing the Hive queries using Partitioning, Bucketing techniques with ACID properties for controlling the data distribution.
  • Running Hive queries through different engines like Spark, MapReduce and TEZ.
  • Worked with NoSQL database Hive, Hbase to create tables and store data.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Experienced in connecting web services in JSON built on REST and SOAP API with the databases and extracting data using Java.
  • Used Hortonworks distribution of hadoop.
  • Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Used Kerberos Authentication system and Ranger for Access Control.
  • Used Distributed copy for intra-cluster data transfer.
  • Used Oozie workflow engine and coordinators to manage timed interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive and Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.
  • Hive performance tuning for the best efficient results.
  • Used Map Reduce programs using Chained Mappers to create data pipeline.
  • Implemented Various Optimization techniques in Hive on TEZ for effective result sets.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance.
  • Experienced in Monitoring Cluster health using Ambari.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Consolidating employee data from Clients, Consultants and Employees around the globe, into Data Lake and visualize the resource trend report through qlickview.
  • Responsible for importing large sets of data from MySQL and load into HDFS using Sqoop on regular basis.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Saved the storage space on HDFS by using compression techniques like LZO, GZip, ZLib and SNAPPY.
  • Compressed the data to fit into the infrastructure with minimal hardware requirements.
  • Created Hive Generic (One to Many and Many to One) UDF's, UDAF's, UDTF's in Java and Python to process business logic that varies based on conventions.
  • Worked on Shell Scripting in Linux and the Cluster. Used shell scripts to run hive queries from Beeline.
  • Created SQL stored procedure’s programmability in HiveQL to run analytics on the data imported.
  • Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union
  • Worked on improving performance of Hive and Pig queries.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Designing conceptual model with Spark for performance optimization.
  • Implemented custom codes for map reduce partitioner and custom writables.
  • Implemented test scripts to support test driven development and continuous integration.
  • Trained and Mentored analyst / test team for writing, running and validating Sqoop scripts and Hive Queries.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Implemented a script to transmit sys print information from MySQL to HBase using Sqoop.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Written programs in scala that runs in spark and worked on Hue interface for querying the data.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Worked with Talend on a POC for integration of data from the data lake.
  • Worked in Linux/Unix Environment.
  • Worked in Agile development team environment, in Sprints with daily scrum meetings.
  • Technical documentation around the whole environment.

Environment: Hadoop, HDFS, HBase, MapReduce, Java, Python, REST, Spark, Hive, Beeline, TEZ, Pig, Sqoop, Flume, Oozie, Hue, Zookeeper, Ambari, Java, Scala, SQL, ETL, DWH, Hortonworks, HUE, Ranger, Talend, MySQL.

Confidential, CA

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Proficient in querying Hbase using Impala.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Design technical solution for real-time analytics using Kafka and Hbase.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
  • Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
  • Imported structured data, tables into Hbase.
  • Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
  • Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
  • Worked around the Cassandra database and proficient in in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Converting queries to Spark SQL and using parquet file as storage format.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Written spark programs in Scala and ran spark jobs on YARN.
  • Designed and Implemented Solr Search using the big data pipeline.
  • Assembled Hive and Hbase with Solr to build a full pipeline for data analysis
  • Experienced in sync up Solr with HBase to compute indexed views for data exploration.
  • Implemented MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
  • Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
  • Experience in upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
  • Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
  • Experienced in build scripts to do continuous integrations systems. Had an exposure to Amazon Web Services - AWS cloud computing (EMR, EC2 and S3 services).
  • Used Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Knowledgeable on Talend for Data integration purpose.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Experienced in Monitoring Cluster using Cloudera manager.

Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.

Confidential

Java/J2EE Developer

Responsibilities:

  • Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
  • Implemented features like logging, user session validation using Spring-AOP module.
  • Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
  • Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
  • Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
  • Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
  • Developed Data Access Layer using Hibernate ORM framework.
  • Developed shell scripts to call stored procedures which reside on the Database.
  • Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
  • Consumed SOAP based web services using Spring to interact with external systems.
  • Implemented SOA architecture with web services using SOAP, WSDL and XML.
  • Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
  • Employed Water Fall Model and best practices for software development.
  • Deployed the application in JBoss Application Server.
  • Used SVN for version control and MAVEN to build scripts for Deployment.
  • Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
  • Worked on Junit for creating test cases for all the Business Rules and the application code.
  • Communicated with ILOG Rules using EJB Remote Lookup.
  • Used JIBX binding to convert Java object to XML and vice-versa.

Confidential

Software Developer

Responsibilities:

  • Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
  • Implemented features like logging, user session validation using Spring-AOP module.
  • Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
  • Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
  • Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
  • Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
  • Developed Data Access Layer using Hibernate ORM framework.
  • Developed shell scripts to call stored procedures which reside on the Database.
  • Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
  • Consumed SOAP based web services using Spring to interact with external systems.
  • Implemented SOA architecture with web services using SOAP, WSDL and XML.
  • Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
  • Employed Water Fall Model and best practices for software development.
  • Deployed the application in JBoss Application Server.
  • Used SVN for version control and MAVEN to build scripts for Deployment.
  • Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
  • Worked on Junit for creating test cases for all the Business Rules and the application code.
  • Communicated with ILOG Rules using EJB Remote Lookup.
  • Used JIBX binding to convert Java object to XML and vice-versa.

We'd love your feedback!