Hadoop Developer Resume
Plano, TexaS
SUMMARY
- Having 8 years of overall experience in building and developing HadoopMapReduce solutions.
- Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Hive, Spark, Sqoop, Impala, Pig, HBase, Kafka, Flume, Storm, Zookeeper, Oozie.
- Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, CloudFormation, Cloud Watch, SQS, IAM), focusing on high - availability, fault tolerance, and auto-scaling.
- Worked on HDFS, namenode, jobtracker, datanode, TASKTRACKER and the Map-Reduce concepts.
- Expert in creating indexes, Views, complex Stored Procedures, user-defined functions, cursors, derived tables, common table expressions (CTEs) and Triggers to facilitate efficient data manipulation and data consistency.
- Expertise in SFDC Administrative tasks like creating Profiles, Roles, OWD, Field Dependencies, Custom objects, Page Layouts, Validation rules, Approvals, Workflow rules, Security and sharing rules, Delegated Administration, Tasks and actions, Public Groups, Queues.
- Experienced in Developing Triggers, Batch Apex, and Scheduled Apex classes.
- Hands-On Experience in Sales Cloud, Service Cloud, Chatter, and Marketing, Customer Portal and Partner Portal and recommended solutions to Improve Business processes using Salesforce CRM.
- Experience in ingestion, storage, querying, processing and analysis of Big Data with hands-on experience in Big Data including Apache Spark, Spark SQL, andSpark Streaming.
- Worked with Spark engine to process large-scale data and experience to create Spark RDD.
- Knowledge of developing Spark Streaming jobs by using RDDs and leverage Spark-Shell.
- Expertise in Talend Big data tool involved in architectural designing and development of ingestion and extraction job in Big Data and SparkStreaming.
- Having experience on RDD architecture and implementing Spark operations on RDD and optimizing transformations and actions in Spark.
- Experience in generating On-demand and Scheduled Reports for business analysis or Management decision using SQL Server Reporting Services (SSRS).
- Worked on designing complex reports including subreports and formulas with complex logic using SQL Server Reporting Services (SSRS).
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Hands-onApacheSpark jobs using Scala in a test environment for faster data processing and used SparkSQL for querying.
- Good in analyzing data using HiveQL, Pig Latin and custom MapReduce program in Java.
- Good in Hive and Impala queries to load and processing data in Hadoop Filesystem (HFS).
- Good understanding of NoSQLDatabases and hands-on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
- Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr,and Kafka.
- Defined real-time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi,and Flume.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper,and Apache Storm.
- Good Knowledge of HDFS high availability (HA) and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Proficient inTest-driven development (TDD), Agile to produce high-quality deliverables.
- Hands-on Agile (Scrum), Waterfall model along with automation and enterprise tools like Jenkins, Chef, JIRA, Confluence to develop projects and version control, Git.
TECHNICAL SKILLS
Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper,and Oozie
Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS, Schemas, JSON, Ajax, Java, Scala, Python, Shell Scripting
NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB
Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, QlikView, Amazon Redshift, or Azure Data Warehouse
Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall.
Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RSA, Control-M, Oziee, Hue, SOAP UI
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.
Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems : All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential, Plano,Texas
Hadoop Developer
Responsibilities:
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper,andSqoop.
- Created complex SQL stored procedures and developed reports using Microsoft SQL Server 2012.
- Designed SSIS (ETL) Packages to extract data from various heterogeneous data sources such as Access database, Excel spreadsheet and flat files into SQL Server.
- Created the packages in SSIS (ETL) with the help of Control Flow Containers, Tasks and Data Flow Transformations.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL
- Imported and Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Used Flume to handle streaming data and loaded the data into Hadoop cluster.
- Developed and executed hive queries for de-normalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE,and Impala to read, write and query the data into HBase.
- Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class
- Developed bash scripts to bring the T-log files from ftp server and then processing it to load into hive tables.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Worked on analyzing data with Hive and Pig.
- Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
- Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
- Created a new airflow DAG to find popular items in redshift and ingest in the main PostgreSQL DB via a web service call.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
- Managed, reviewed Hadoop log file and worked in analyzing SQL scripts and designed the solution for the process using Spark.
- Integrated Apache Storm with Kafka to perform web analytics and to perform clickstream data from Kafka to HDFS.
- Integrated Kafka-Spark streaming for high-efficiency throughput and reliability.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed pig scripts to transform the data into a structured format and it is automated through Oozie coordinators.
- Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL,and Spark YARN.
- Developed MapReduce programs for applying business rules to the data.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Involved in joining and data aggregation using Apache Crunch.
- Developed Spark Applications by using Scala, Java and Implemented ApacheSpark data processing project to handle data from various RDBMS and Streaming sources.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics
- Written multiple MapReduce programs in Java for data extraction, transformation,and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Manage the day-to-day operations of the cluster for backup and support
- Defined best practices for creating Tableau dashboards by matching requirements to the charts to be chosen, color patterns as per user's needs, standardizing dashboard's size, look and feel etc.
- Developed visualizations using sets, Parameters, Calculated Fields, Actions, sorting, Filtering, Parameter driven analysis.
- Experience in setting up Hadoop clusters on cloud platforms like AWS.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.
Environment: Apache Hadoop, HBase, Hive, Pig, Sqoop, Zookeeper, Hortonworks, NoSQL, HBase, Storm, Microsoft SQL Server 2012, ETL, YARN, Apache Airflow Dag, MapReduce, Tableau Server 10.1 HDFS, Scala, Impala, Flume, MySQL, JDK1.6, J2ee, JDBC, Servlets, JSP, Struts 2.0, Spring 2.0, Hibernate, Python, WebLogic, SOAP, MongoDB, Spark.
Confidential, FosterCity,CA
Hadoop/Spark Developer
Responsibilities:
- Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Spark code using Scala and Spark-SQL for faster processing and testing.
- Implemented Spark sample programs in python using PySpark.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Developed PySpark code to mimic the transformations performed in the on-premise environment.
- Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Populated HDFS and HBase with huge amounts of data using Apache Kafka.
- Used Kafka to ingest data into Spark engine.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experienced with different scripting languages like Python and shell scripts.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks,and performance analysis.
- Tested Apache TEZ, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs.
- Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala,and Python.
- Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real-time and Persists into Cassandra.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
- Implemented Hortonworks Nifi (HDP 2.4) and recommended a solution to inject data from multiple data sources to HDFS and Hive using Nifi.
- Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Along with the Infrastructure team, involved in the design and developed Kafka and Storm based data pipeline.
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
Environment:Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, NIFI, MySQL, Tableau, AWS, EC2, S3, Hortonworks, Power BI.
Confidential - Herndon, VA
Hadoop Developer.
Responsibilities:
- Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and Bootstrap JS.
- Took the ownership of complete application Design of Java part, Hadoop integration
- Apart from the normal requirement gathering, participated in a Business meeting with the client to gather security requirements.
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster
- Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics
- Written event-driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.
- Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
- Developed the JAX- RS web services code using Apache CXF framework to fetch data from SOLR when the user performed the search for documents
- Participated in SOLR schema, and ingested data into SOLR for data indexing.
- Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed. Implemented Bloom filters in Cassandra using keyspace creation
- Involved in writing Cassandra CQL statements God hands-on experience in developing concurrency using spark and Cassandra together
- Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations,and Actions while implementing spark applications
- Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database
- Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Datasets to figure out the patterns of trends by applying client-specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs
- Stored the derived the results in HBasefrom analysis and make it available to data ingestion for SOLR for indexing data
- Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
- Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities
Environment: Cassandra, Spring 3.2, MVC, HTML5, CSS, AngularJS, Restful services using CXF web services framework, spring data, SOLR 5.2.1, PIG, HIVE, apache AVRO, Map Reduce, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3, YARN, Ambari
Confidential -NY
Hadoop developer
Responsibilities:
- Collected and aggregated large amounts of weblog data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Collecting data from various Flume agents that are imported on various servers using Multi-hop flow.
- Ingest real-time and near-real-time (NRT) streaming data into HDFS using Flume.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name node, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
- Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
- Responsible for building Scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts.
- Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and revoke.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Worked on HBase for support enterprise production and loading data into HBase using SQOOP.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Expertise in understanding Partitions, Bucketing concepts in Hive.
- Experience working with Apache SOLR for indexing and querying.
- Created custom SOLR Query segments to optimize ideal search matching.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the MapReduce Jobs that extract the data in a timely manner. Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack
- Utilized cluster co-ordination services through Zookeeper.
- Worked on the Ingestion of Files into HDFS from remote systems using MFT.
- Got good experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, and data manipulation.
- Developed Pig scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Developed Shell scripts to automate routine DBA tasks.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
Environment: HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Shell Scripting.
SQL/SSIS/SSRS Developer
Confidential - Boca Raton, FL
Responsibilities:
- Extract, Transform and Load (ETL) source data into respective target tables to build the required data marts.
- Involved in designing ETL as a part of Data warehousing and loaded data in to Fact tables using SSIS.
- Supported Production Environment with schedule the packages and make the package dynamic with SQL Server Package Configuration.
- Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to move file into Archive after processing and Execute SQL task to insert transaction log data into the SQL table.
- Deployed SSIS Package into Production and used Package configuration to export various package properties to make package environment independent.
- Developed queries or stored procedures using T-SQL to be used by reports to retrieve information from relational database and data warehouse
- Worked extensively with Advance Analysis Actions, Calculations, Parameters, Background images and Maps.
- Create Common Table expressions (CTE) and temp tables to facilitate the complex the queries
- Generated and formatted Reports using Global Variables, Expressions and Functions for the reports. Designed and implemented stylish report layouts.
- Developed Query for generating drill down and drill through reports in SSRS.
- Designed new reports and wrote technical documentation, gathered requirements, analyzed data, developed and built SSRS reports and dashboard.
- Developed SQL queries or stored procedures used by reports to retrieve information from relational database and data warehouse.
Confidential - Fort Laureled, FL
Hadoop developerResponsibilities:
- Wrote complex queries using T-SQL to create joins, sub queries, functions and correlated sub queries to retrieve data from the database.
- Identified, tested, and resolved database performance issues (monitoring and tuning) to ensure database optimization.
- Created/Updated database objects like tables, views, stored procedures, function, packages.
- Designed MS SSIS Packages to extract data from various OLTP sources to MS SQL Server
- Build efficient SSIS packages for processing fact and dimension tables with complex Transforms and type 1 and type 2 slowly changing dimensions.
- Created mapping tables to find out the missing attributes for the ETL process.
- Created SQL Jobs to schedule SSIS Packages
- Skilled in error and event handling: precedence Constraints, Break Points, Check points and Logging.
- Created VB.Net, C# Script for Data Flow and Error Handling using Script component in SSIS.
- Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database.
- Rigorously tested and debugged the Stored Procedures and used Triggers to test the validity of the data after the insert, update or delete.
- Monitored the overall performance of the database to recommend and initiate actions to improve/optimize Performance.
- Used SQL Server Profiler to trace the slow running queries and the server activity.
- Automated and enhanced daily administrative tasks including database backup and recovery.
- Involved in setting up SQL Server Agent Jobs for periodic Backups with backup devices, database maintenance plans and recovery.
- Generated Drill down, Drill through, Matrix Reports, parameterized reports using SSRS.
- Preparing reports using calculated fields, parameters, calculations, groups, sets and hierarchies in SSRS.
- Used Multiple Measures like Individual Axes, Blended Axes, and Dual Axes.
- Migrated reports from SSRS to PowerBI.
