Sr. Hadoop Developer Resume Chicago IL - Hire IT People

SUMMARY

Around 7 Years of IT experience in Application Development domain of Java and Big Data.
Experience in Hadoop And it’s Eco System; HDFS, Map Reduce, Apache Pig, Hive, HBase, Oozie, Scala, Spark, Flume, Kafka, Storm And Sqoop.
Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Tez, Flume, Kafka, Storm, Spark, Scala, MongoDB, Couchbase and Cassandra.
Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and Scala.
Good knowledge on building Apache spark applications using Scala.
Having a Good exposure on Big Data technologies andHadoopecosystem, In - depth understanding of Map Reduce and theHadoopInfrastructure.
Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, mlib, Kafka, Flume, MapReduce, Hive etc.
Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
Proficient in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language). Have 2 years of profound experience in Cassandra database
Hadoop Administrator and developer: Set-up configured and monitored Hadoop cluster. Executed performance tuning of cluster.
Skilled in managing and reviewing Hadoop log files.
Expert in importing and exporting data into HDFS and Hive using Sqoop.
Experienced in loading data to Hive partitions and creating buckets in Hive
Experienced in configuring Flume to stream data into HDFS.
Experienced in real-time Big Data solutions using Hbase, handling billions of records.
Extensive experience in working with structured/semi-structured and unstructured data by implementing complex map reduce programs using design patterns.
Familiarity with Hadoop architecture and its components like HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node and Data Node.
Experienced in Application Development using Java, Hadoop, RDBMS and Linux shell scripting and performance tuning.
Expertise in writing ETL Jobs for analyzing data using Pig.
Experienced querying in Impala.
Experience in data warehousing with ETL tool Oracle Warehouse Builder (OWB).
Familiarity with distributed coordination system Zookeeper.
Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
Experienced in working with Solr indexing and querying.
Excellent knowledge in Java and SQL in application development and deployment.
In-depth understanding of Data Structures and Algorithms and Optimization.
Have a very good understanding and worked with relational databases like MySQL, Oracle and NoSQL databases like Hbase, Mongo DB, Couchbase and Cassandra.
Well versed with databaseslike MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL.
Passionate towards working inHadoopand Big Data Technologies, data science, machine learning in Spark, Big Data Processing, Analytics and Visualization.
Versatile experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlets, Java beans and JDBC.
Expertise in developing the presentation layer components like HTML, CSS, JavaScript, JQuery, XML, JSON, AJAX and D3.
Experienced in source control repositories viz. SVN, GitHub.
Good Knowledge of analyzing data in HBase using Hive and Pig.
Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
Designed Mapping documents for Big Data Application.
Experienced in Agile and SCRUM.

TECHNICAL SKILLS

Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Zookeeper, Oozie, Sqoop, Flume, Impala, Nifi, Solr, Tez and Ranger, Talend, Tableau/Qlickview.

No SQL: HBase, MongoDB, Couchbase, Neo4j, Cassandra

Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python, Scala

Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine

Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS

Operating system: Windows, Macintosh, Linux and Unix

Frameworks: Springs, MVC, Hibernate, Swings

DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits

Version Control: SVN, CVS and Rational Clear Case Remote Client, GitHub, Visual Studio

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer, WinScp, Tahiti, Cygwin

PROFESSIONAL EXPERIENCE

Confidential, Chicago IL

Sr. Hadoop Developer

Responsibilities:

Moved data using Sqoop into the Data Lake, Pig cleansing and Hive tables
Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
Moved files using Web hdfs url to find and reap the files from source (MapR) to destination (hortonworks cluster)
Optimization at all levels of development Raw, Refined and Enriched
Moved the data from MapR to Hortonworks during the migration.
Prepared xml files and automated the ingestion jobs using Oozie and Falcon.
Running and optimizing pig frameworks to cleanse the data.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Worked on supply chain project to ingest data from source systems and reap the data into Data Lake
Skilled in managing and reviewing Oozie and Falcon Log files and making necessary updates in viewing them on the dashboard visualization.
Working along with solution design teams to come up with the optimal solution on using appropriate set of tools for the specific sources of data
Developing use cases for processing real time streaming data using tools like Spark Streaming.
Worked with due diligence team to explore whether Nifi was a feasible option to our solution.
Experienced in processing Streaming data using injection tools like Kafka and Flume
Developing data retention mechanism and automation system to purge the archive data on shell and Oozie
Working on Agile Methodology and Waterfall model.

Environment: Hadoop, HDFS, Hive, MapReduce, Shell, Spark, TEZ, Pig, Sqoop, Flume, Kafka, Storm, Nifi, HBase, Oozie, Falcon, MapR, Hortonworks

Confidential, Houston TX

Sr. Hadoop Developer

Responsibilities:

Moved Relational Database data using Sqoop as ETL tool into the Data Lake, Hive Dynamic partition tables.
Used Recursive queries using several levels of joins in Hive to transform the SQL Stored Procedures, to make them work in the data lake.
Optimizing the Hive queries using Partitioning, Bucketing techniques with ACID properties for controlling the data distribution.
Running Hive queries through different engines like Spark, MapReduce and TEZ.
Worked with NoSQL database Hive, Hbase to create tables and store data.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Experienced in connecting web services in JSON built on REST and SOAP API with the databases and extracting data using Java.
Used Hortonworks distribution of hadoop.
Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Used Kerberos Authentication system and Ranger for Access Control.
Used Distributed copy for intra-cluster data transfer.
Used Oozie workflow engine and coordinators to manage timed interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive and Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.
Hive performance tuning for the best efficient results.
Used Map Reduce programs using Chained Mappers to create data pipeline.
Implemented Various Optimization techniques in Hive on TEZ for effective result sets.
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance.
Experienced in Monitoring Cluster health using Ambari.
Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
Consolidating employee data from Clients, Consultants and Employees around the globe, into Data Lake and visualize the resource trend report through qlickview.
Responsible for importing large sets of data from MySQL and load into HDFS using Sqoop on regular basis.
Created customized BI tool for manager team that perform Query analytics using HiveQL.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Saved the storage space on HDFS by using compression techniques like LZO, GZip, ZLib and SNAPPY.
Compressed the data to fit into the infrastructure with minimal hardware requirements.
Created Hive Generic (One to Many and Many to One) UDF's, UDAF's, UDTF's in Java and Python to process business logic that varies based on conventions.
Worked on Shell Scripting in Linux and the Cluster. Used shell scripts to run hive queries from Beeline.
Created SQL stored procedure’s programmability in HiveQL to run analytics on the data imported.
Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union
Worked on improving performance of Hive and Pig queries.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Designing conceptual model with Spark for performance optimization.
Implemented custom codes for map reduce partitioner and custom writables.
Implemented test scripts to support test driven development and continuous integration.
Trained and Mentored analyst / test team for writing, running and validating Sqoop scripts and Hive Queries.
Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
Implemented a script to transmit sys print information from MySQL to HBase using Sqoop.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Written programs in scala that runs in spark and worked on Hue interface for querying the data.
Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
Worked with Talend on a POC for integration of data from the data lake.
Worked in Linux/Unix Environment.
Worked in Agile development team environment, in Sprints with daily scrum meetings.
Technical documentation around the whole environment.

Environment: Hadoop, HDFS, HBase, MapReduce, Java, Python, REST, Spark, Hive, Beeline, TEZ, Pig, Sqoop, Flume, Oozie, Hue, Zookeeper, Ambari, Java, Scala, SQL, ETL, DWH, Hortonworks, HUE, Ranger, Talend, MySQL.

Confidential, CA

Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
Responsible for importing log files from various sources into HDFS using Flume.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Created customized BI tool for manager team that perform Query analytics using HiveQL.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
Worked with NoSQL database Hbase to create tables and store data.
Proficient in querying Hbase using Impala.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Design technical solution for real-time analytics using Kafka and Hbase.
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
Imported structured data, tables into Hbase.
Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Created Data Pipeline of Map Reduce programs using Chained Mappers.
Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
Worked around the Cassandra database and proficient in in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Converting queries to Spark SQL and using parquet file as storage format.
Developed analytical component using Scala, Spark and Spark Stream.
Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Written spark programs in Scala and ran spark jobs on YARN.
Designed and Implemented Solr Search using the big data pipeline.
Assembled Hive and Hbase with Solr to build a full pipeline for data analysis
Experienced in sync up Solr with HBase to compute indexed views for data exploration.
Implemented MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
Experience in upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
Experienced in build scripts to do continuous integrations systems. Had an exposure to Amazon Web Services - AWS cloud computing (EMR, EC2 and S3 services).
Used Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
Knowledgeable on Talend for Data integration purpose.
Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
Experienced in Monitoring Cluster using Cloudera manager.

Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.

Confidential

Java/J2EE Developer

Responsibilities:

Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
Implemented features like logging, user session validation using Spring-AOP module.
Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
Developed Data Access Layer using Hibernate ORM framework.
Developed shell scripts to call stored procedures which reside on the Database.
Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
Consumed SOAP based web services using Spring to interact with external systems.
Implemented SOA architecture with web services using SOAP, WSDL and XML.
Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
Employed Water Fall Model and best practices for software development.
Deployed the application in JBoss Application Server.
Used SVN for version control and MAVEN to build scripts for Deployment.
Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
Worked on Junit for creating test cases for all the Business Rules and the application code.
Communicated with ILOG Rules using EJB Remote Lookup.
Used JIBX binding to convert Java object to XML and vice-versa.

Confidential

Software Developer

Responsibilities:

Worked with Business team and attended Daily scrum meetings, sprint planning, sprint review, and sprint retrospective also working with Product Owner On Artifacts Such as Product Backlog.
Implemented features like logging, user session validation using Spring-AOP module.
Used Spring MVC Framework at Business Tier and also Spring Bean Factory for initializing services.
Worked extensively on Spring IOC/ Dependency Injection, Configured the crosscutting concerns like logging, security using Spring AOP.
Integrated Spring and Hibernate, injecting Hibernate Template class into the DAOs.
Have coded numerous DAO's using Hibernate Dao Support. Used Criteria, HQL and SQL as the query languages in Hibernate Mapping. Integrated the Spring and Hibernate framework.
Developed Data Access Layer using Hibernate ORM framework.
Developed shell scripts to call stored procedures which reside on the Database.
Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML.
Consumed SOAP based web services using Spring to interact with external systems.
Implemented SOA architecture with web services using SOAP, WSDL and XML.
Used Apache CXF to post messages to external vendor sites and exposed Web Services to other client applications like a Admin Tool.
Employed Water Fall Model and best practices for software development.
Deployed the application in JBoss Application Server.
Used SVN for version control and MAVEN to build scripts for Deployment.
Implemented Java Messaging Services (JMS) for asynchronous messaging using the Message Driven Beans. Used Message Driven Beans to call the EJB.
Worked on Junit for creating test cases for all the Business Rules and the application code.
Communicated with ILOG Rules using EJB Remote Lookup.
Used JIBX binding to convert Java object to XML and vice-versa.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship