Sr. Hadoop Developer Resume Aurora, Illinois - Hire IT People

PROFESSIONAL SUMMARY:

Over all 8+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
In depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming .
Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
Expertise in Spark Architecture including Spark Core, Spark SQL , Data Frames, Spark Streaming and Spark MLlib.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala .
Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
Experience with all flavor of Hadoop distributions, including Cloudera, Hortonworks, Mapr and Apache.
Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (5.X) distributions and on Amazon web services (AWS).
Expertise in implementing SparkScala application using higher order functions for both batch and interactive analysis requirement.
Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
Hands on experience in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Experienced in working with structured data using HiveQL , join operations, Hive UDFs , partitions , bucketing and internal / external tables.
Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
Good Experience with NoSQL Databases like HBase, MongoDB and Cassandra.
Experience on using Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
Hands on experience in querying and analyzing data from Cassandra for quick searching, sorting and grouping through CQL.
Experience working with MongoDB for distributed storage and processing.
Good knowledge and experienced in Extracting files from MongoDB through Sqoop and placed in HDFS and processed.
Worked on importing data into HBase using HBase Shell and HBase Client API .
Experience in designing and developing tables in HBase and storing aggregated data from Hive Table.
Good knowledge in working with scheduling jobs in Hadoop using FIFO , Fair scheduler and Capacity scheduler.
Experienced in designing both time driven and data driven automated workflows using Oozie and Zookeeper .
Experience working on Solr for developing search engine on unstructured data in HDFS.
Extensively used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL .
Experience in Extraction, Transformation and Loading ( ETL ) of data from multiple sources like Flat files, XML files, and Databases .
Supported various reporting teams and experience with data visualization tool Tableau.
Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
ETL Tools like IBM DataStage, Informatica and Talend.
Experienced and in-depth knowledge of cloud integration with AWS using Elastic Map Reduce ( EMR ), Simple Storage Service ( S3 ), EC2 , Redshift and Microsoft Azure .
Detailed understanding of Software Development Life Cycle ( SDLC ) and strong knowledge in project implementation methodologies like Waterfall and Agile .

TECHNICAL SKILLS:

Languages: C, C++, Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.

Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Bedrock, Flume, Kafka, Impala, NiFi, MongoDB, HBase.

Databases: Oracle, MS: SQL Server, MySQL, PostgreSQL, NoSQL (HBase, Cassandra, MongoDB), Teradata.

Tools: Eclipse, NetBeans, Informatica, IBM DataStage, Talend, Maven, Jenkins.

Hadoop Platforms: Hortonworks, Cloudera, Azure, Amazon Web services (AWS).

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite, MS Vision, MS Project Professional.

PROFESSIONAL EXPERIENCE:

Confidential, Aurora, Illinois

Sr. Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Used SparkStreaming to divide streaming data into batches as an input to Sparkengine for batch processing.
Evaluated the performance of Apache Spark in analyzing genomic data.
Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Created POC using SparkSQL and MLlib libraries.
Experienced in managing and reviewing Hadoop log files.
Worked closely with EC2 infrastructure teams to troubleshoot complex issues.
Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
Involved in installing EMR clusters on AWS.
Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
Apply Transformation rules on the top of DataFrames.
Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing
Developed HiveUDFs and UDAF’s for rating aggregation.
Developed java client API for CRUD and analytical Operations by building a restful server and exposing data from No-SQL databases like Cassandra via rest protocol.
Created Hive tables and involved in data loading and writing Hive UDFs.
Experience in managing and reviewing Hadoop Log files.
Worked extensively with Sqoop to move data from DB2 and Teradata to HDFS.
Collected the logs data from web servers and integrated in to HDFS using Kafka.
Provided ad-hoc queries and data metrics to the Business Users using Hive, Impala.
Worked on various performance optimizations like using distributed cache for small datasets, partition, bucketing in hive, map side joins etc.
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
Used Cassandra (CQL) with Java API's to retrieve data from Cassandra tables.
Worked on analyzing and examining customer behavioral data using Cassandra.
Worked on Solr configuration and customizations based on requirements.
Indexed documents using Apache Solr.
Extensively use Zookeeper as job scheduler for Spark Jobs.
Worked with BI teams in generating the reports on Tableau.
Used JIRA for bug tracking and CVS for version control.

Environment: Hadoop, MapReduce, HDFS, PIG, Hive, Sqoop, Oozie, Storm, Kafka, Spark, Spark Streaming, Scala, Cassandra, Cloudera, ZooKeeper, AWS, Solr, MySQL, Shell Scripting, Java, Tableau.

Confidential, Austin, Texas

Sr. Hadoop Developer

Responsibilities:

Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
Involved in loading data from Linux file systems, servers, java web services using Kafka producers and partitions.
Applied Kafka custom encoders for custom input format to load data into Kafka Partitions.
Implement POC with Hadoop. Extract data with Spark into HDFS.
Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed code to read data stream from Kafka and send it to respective bolts through respective stream.
Worked on Spark streaming using Apache Kafka for real time data processing.
Experience in creating Kafka producer and Kafka consumer for Spark streaming.
Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
Experienced in optimizing Hive queries, joins to handle different data sets.
Configured Oozie schedulers to handle different Hadoop actions on timely basis.
Involved in ETL, Data Integration and Migration by writing pig scripts.
Used different file formats like Text files, Sequence Files, Avro using Hive SerDe's.
Integrated Hadoop with Solr and implement search algorithms.
Experience in Storm for handling realtime processing.
Hands on Experience working in Hortonworks distribution.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
Designed and implemented MongoDB and associated RESTful web service.
Worked on analyzing and examining customer behavioral data using MongoDB.
Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
Setup Spark EMR to process huge data which is stored in Amazon S3.
Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
UsedTalend tool to create workflows for processing data from multiple source systems.

Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, CA

Hadoop developer

Responsibilities:

Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option parameters.
Used Sqoop extensively to ingest data from various source systems into HDFS.
Hive was used to produce results quickly based on the report that was requested.
Played a major role in working with the team to leverage Sqoop for extracting data from Teradata.
Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
I ntegrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
Developed PIG UDFs for the needed functionality such as custom Pigsloader known as timestamp loader.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
Involved in moving log files generated from various sources to HDFS for further processing through Flume.
Worked on different file formats like Text files, Parquet, Sequence Files, Avro , Record columnar files (RC).
Developed several shell scripts, which acts as wrapper to start theseHadoop jobs and set the configuration parameters.
Kerberos security was implemented to safeguard the cluster.
Worked on a stand-alone as well as a distributed Hadoop application.
Tested the performance of the data sets on various NoSQL databases.
Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Environment: Hadoop, HDFS, Pig,Flume, Hive, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, SQL Server, Shell Scripting.

Confidential, Minneapolis, MN

Hadoop/Java developer

Responsibilities:

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
Installed and configured Cassandra DSE multi-node, multi-data center cluster.
Designed and implemented a 24 node Cassandra cluster for single point inventory application.
Analyzed the performance of Cassandra cluster using nodetool TP stats and CFstats for thread analysis and latency analysis.
Implemented Real time analytics on Cassandra data using thrift API.
Responsible to manage data coming from different sources.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Load and transform large sets data into HDFS using Hadoop fs commands.
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
Used Hibernate ORM framework with Spring framework for data persistence.
Wrote test cases in JUnitfor unit testing of classes.
Involved in templates and screens in HTML and JavaScript.

Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Aurora, IllinoiS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship