Sr. Hadoop Developer Resume
Chicago, IL
SUMMARY
- Around 7 Years of IT experience in Application Development domain of Java and Big Data.
- Experience in Hadoop And it’s Eco System; HDFS, Map Reduce, Apache Pig, Hive, HBase, Oozie, Scala, Spark, Flume, Kafka, Storm And Sqoop.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Tez, Flume, Kafka, Storm, Spark, Scala, MongoDB, Couchbase and Cassandra.
- Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and Scala.
- Good knowledge on building Apache spark applications using Scala.
- Having a Good exposure on Big Data technologies andHadoopecosystem, In - depth understanding of Map Reduce and theHadoopInfrastructure.
- Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, mlib, Kafka, Flume, MapReduce, Hive etc.
- Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
- Proficient in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language). Have 2 years of profound experience in Cassandra database
- Hadoop Administrator and developer: Set-up configured and monitored Hadoop cluster. Executed performance tuning of cluster.
- Skilled in managing and reviewing Hadoop log files.
- Expert in importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Experienced in configuring Flume to stream data into HDFS.
- Experienced in real-time Big Data solutions using Hbase, handling billions of records.
- Extensive experience in working with structured/semi-structured and unstructured data by implementing complex map reduce programs using design patterns.
- Familiarity with Hadoop architecture and its components like HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node and Data Node.
- Experienced in Application Development using Java, Hadoop, RDBMS and Linux shell scripting and performance tuning.
- Expertise in writing ETL Jobs for analyzing data using Pig.
- Experienced querying in Impala.
- Experience in data warehousing with ETL tool Oracle Warehouse Builder (OWB).
- Familiarity with distributed coordination system Zookeeper.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Experienced in working with Solr indexing and querying.
- Excellent knowledge in Java and SQL in application development and deployment.
- In-depth understanding of Data Structures and Algorithms and Optimization.
- Have a very good understanding and worked with relational databases like MySQL, Oracle and NoSQL databases like Hbase, Mongo DB, Couchbase and Cassandra.
- Well versed with databaseslike MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL.
- Passionate towards working inHadoopand Big Data Technologies, data science, machine learning in Spark, Big Data Processing, Analytics and Visualization.
- Versatile experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlets, Java beans and JDBC.
- Expertise in developing the presentation layer components like HTML, CSS, JavaScript, JQuery, XML, JSON, AJAX and D3.
- Experienced in source control repositories viz. SVN, GitHub.
- Good Knowledge of analyzing data in HBase using Hive and Pig.
- Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
- Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
- Designed Mapping documents for Big Data Application.
- Experienced in Agile and SCRUM.
TECHNICAL SKILLS
Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Zookeeper, Oozie, Sqoop, Flume, Impala, Nifi, Solr, Tez and Ranger, Talend, Tableau/Qlickview.
No SQL: HBase, MongoDB, Couchbase, Neo4j, Cassandra
Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python, Scala
Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine
Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS
Operating system: Windows, Macintosh, Linux and Unix
Frameworks: Springs, MVC, Hibernate, Swings
DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL
IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits
Version Control: SVN, CVS and Rational Clear Case Remote Client, GitHub, Visual Studio
Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer, WinScp, Tahiti, Cygwin
PROFESSIONAL EXPERIENCE
Confidential, Chicago IL
Sr. Hadoop Developer
Responsibilities:
- Moved data using Sqoop into the Data Lake, Pig cleansing and Hive tables
- Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
- Moved files using Web hdfs url to find and reap the files from source (MapR) to destination (hortonworks cluster)
- Optimization at all levels of development Raw, Refined and Enriched
- Moved the data from MapR to Hortonworks during the migration.
- Prepared xml files and automated the ingestion jobs using Oozie and Falcon.
- Running and optimizing pig frameworks to cleanse the data.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Worked on supply chain project to ingest data from source systems and reap the data into Data Lake
- Skilled in managing and reviewing Oozie and Falcon Log files and making necessary updates in viewing them on the dashboard visualization.
- Working along with solution design teams to come up with the optimal solution on using appropriate set of tools for the specific sources of data
- Developing use cases for processing real time streaming data using tools like Spark Streaming.
- Worked with due diligence team to explore whether Nifi was a feasible option to our solution.
- Experienced in processing Streaming data using injection tools like Kafka and Flume
- Developing data retention mechanism and automation system to purge the archive data on shell and Oozie
- Working on Agile Methodology and Waterfall model.
Environment: Hadoop, HDFS, Hive, MapReduce, Shell, Spark, TEZ, Pig, Sqoop, Flume, Kafka, Storm, Nifi, HBase, Oozie, Falcon, MapR, Hortonworks
Confidential, Houston TX
Sr. Hadoop Developer
Responsibilities:
- Moved Relational Database data using Sqoop as ETL tool into the Data Lake, Hive Dynamic partition tables.
- Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
- Optimizing the Hive queries using Partitioning, Bucketing techniques with ACID properties for controlling the data distribution.
- Running Hive queries through different engines like Spark, MapReduce and TEZ.
- Worked with NoSQL database Hive, Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Experienced in connecting web services in JSON built on REST and SOAP API with the databases and extracting data using Java.
- Used Hortonworks distribution of hadoop.
- Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Used Kerberos Authentication system and Ranger for Access Control.
- Used Distributed copy for intra-cluster data transfer.
- Used Oozie workflow engine and coordinators to manage timed interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive and Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.
- Hive performance tuning for the best efficient results.
- Used Map Reduce programs using Chained Mappers to create data pipeline.
- Implemented Various Optimization techniques in Hive on TEZ for effective result sets.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance.
- Experienced in Monitoring Cluster health using Ambari.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Consolidating employee data from Clients, Consultants and Employees around the globe, into Data Lake and visualize the resource trend report through qlickview.
- Responsible for importing large sets of data from MySQL and load into HDFS using Sqoop on regular basis.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Saved the storage space on HDFS by using compression techniques like LZO, GZip, ZLib and SNAPPY.
- Compressed the data to fit into the infrastructure with minimal hardware requirements.
- Created Hive Generic (One to Many and Many to One) UDF's, UDAF's, UDTF's in Java and Python to process business logic that varies based on conventions.
- Worked on Shell Scripting in Linux and the Cluster. Used shell scripts to run hive queries from Beeline.
- Created SQL stored procedure’s programmability in HiveQL to run analytics on the data imported.
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union
- Worked on improving performance of Hive and Pig queries.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Designing conceptual model with Spark for performance optimization.
- Implemented custom codes for map reduce partitioner and custom writables.
- Implemented test scripts to support test driven development and continuous integration.
- Trained and Mentored analyst / test team for writing, running and validating Sqoop scripts and Hive Queries.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Implemented a script to transmit sys print information from MySQL to HBase using Sqoop.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Written programs in scala that runs in spark and worked on Hue interface for querying the data.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Worked with Talend on a POC for integration of data from the data lake.
- Worked in Linux/Unix Environment.
- Worked in Agile development team environment, in Sprints with daily scrum meetings.
- Technical documentation around the whole environment.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, Python, REST, Spark, Hive, Beeline, TEZ, Pig, Sqoop, Flume, Oozie, Hue, Zookeeper, Ambari, Java, Scala, SQL, ETL, DWH, Hortonworks, HUE, Ranger, Talend, MySQL.
Confidential, CA
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked with NoSQL database Hbase to create tables and store data.
- Proficient in querying Hbase using Impala.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Design technical solution for real-time analytics using Kafka and Hbase.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
- Imported structured data, tables into Hbase.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
- Worked around the Cassandra database and proficient in in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Converting queries to Spark SQL and using parquet file as storage format.
- Developed analytical component using Scala, Spark and Spark Stream.
- Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written spark programs in Scala and ran spark jobs on YARN.
- Designed and Implemented Solr Search using the big data pipeline.
- Assembled Hive and Hbase with Solr to build a full pipeline for data analysis
- Experienced in sync up Solr with HBase to compute indexed views for data exploration.
- Implemented MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
- Experience in upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
- Experienced in build scripts to do continuous integrations systems. Had an exposure to Amazon Web Services - AWS cloud computing (EMR, EC2 and S3 services).
- Used Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Knowledgeable on Talend for Data integration purpose.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.
Confidential
Java/J2EE Developer
Responsibilities:
- Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
- Implemented features like logging, user session validation using Spring-AOP module.
- Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
- Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
- Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
- Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
- Developed Data Access Layer using Hibernate ORM framework.
- Developed shell scripts to call stored procedures which reside on the Database.
- Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
- Consumed SOAP based web services using Spring to interact with external systems.
- Implemented SOA architecture with web services using SOAP, WSDL and XML.
- Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
- Employed Water Fall Model and best practices for software development.
- Deployed the application in JBoss Application Server.
- Used SVN for version control and MAVEN to build scripts for Deployment.
- Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
- Worked on Junit for creating test cases for all the Business Rules and the application code.
- Communicated with ILOG Rules using EJB Remote Lookup.
- Used JIBX binding to convert Java object to XML and vice-versa.
Confidential
Software Developer
Responsibilities:
- Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
- Implemented features like logging, user session validation using Spring-AOP module.
- Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
- Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
- Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
- Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
- Developed Data Access Layer using Hibernate ORM framework.
- Developed shell scripts to call stored procedures which reside on the Database.
- Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
- Consumed SOAP based web services using Spring to interact with external systems.
- Implemented SOA architecture with web services using SOAP, WSDL and XML.
- Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
- Employed Water Fall Model and best practices for software development.
- Deployed the application in JBoss Application Server.
- Used SVN for version control and MAVEN to build scripts for Deployment.
- Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
- Worked on Junit for creating test cases for all the Business Rules and the application code.
- Communicated with ILOG Rules using EJB Remote Lookup.
- Used JIBX binding to convert Java object to XML and vice-versa.
