Hadoop Engineer Lead Resume NY - Hire IT People

SUMMARY

Over 11+ years of diversified experience in Software Design & Development. Experience as Hadoop developer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.

TECHNICAL SKILLS

Scripting Language: Java, Scala, Python

Hadoop Ecosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase,Impala,oozie,HUE, MongoDB.

ETL Tools: Informatica Power Center 6.1/ 7.1/ 9.1

Operating Systems: MS - DOS, Windows95/98/NT/XP/7, Linux, Unix

Web Technologies: JSP, JDBC, CSS

Databases: Oracle, My SQL

Application /Web Server: Apache Tomcat 4.0, Web Logic, TFS

Functional Testing Tools: Quick Test Pro, Selenium, Load Runner, Quality Center, HPALM, JIRA

PROFESSIONAL EXPERIENCE

Confidential, NY

Hadoop Engineer Lead

Environment: Windows 7/Linux, Hadoop 2.0, Yarn, SharePoint 2014, Hadoop, Amazon S3, Hive, Pig, Map Reduce, Impala, Sqoop, Flume, MongoDB, Zookeeper, Kafka, HBASE, Putty, MySQL, Cloudera, Agile, shell scripting, Java,Scala

Responsibilities:

Setting up multi cluster environment running on CDH 5.11.
Worked with highlyunstructured and semi structured data of 2048 TBin size.
Setting up both prod and dev nodes. 120 total nodes. HDFS Storage has 80 nodes including 12 disk /node having 3.7 TB memory.
Involved in data loading from external sources with Impala queries to target tables
Involved in setting up MongoDB cluster. Set up user roles, security.
Worked on developing JVM applications with Scala and Java.
Extensive and good understanding of Hortonworks (HDP 2.6)
Created oozie workflow for data ingestion. Workflow runs every week.
Involved in sentiment analysis process with spark streaming. Carry out sentiment analysis on Facebook, twitter, Instagram by data ingress through Flume, processing the jobs in SparkJava/scalaand store them in Hbase and MongoDB.
Involved in reading writing data from hive through spark SQL & data frames.
Setting Monitoring, alerts and events on the CDH with batch Linux/shell scripting
Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, kafka.
Secure cluster with Kerberos.
Used Machine Learning library(Spark Mlib) to train data for sentiment analysis on social media.
Worked on processing customer feedback data from press ganey surveys with Spark Scala and storing in Hive tables for further analysis with Tableau
Involved in importing data from RDBMS with Sqoop jobs. Troubleshoot if any of the jobs fails to finish.
Involved in complete ETL process by doing data loading onto target tables from external vendors data
Involve in implementing Github for source versioning, code migration repo.
Getting files from Amazon AWS s3 bucket to prod cluster and store them in amazon redshift.
Involved in giving access to Cloudera manager, Hue, oozie to internal and external users by creating SR request
Granting and revoking users privileges to the both dev and prod clusters
Every day running health tests on the services running on Cloudera
Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
Setting up jobs in Crontab as well as creating oozie workflows, coordination’s.
Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
Involved in Data analytics and reporting with tableau.
Sending daily status of the services running on the cluster to the scrum master.
Involved in creating documentation regarding all jobs running on cluster

Confidential, Ridgefield, NJ

Big Data/Hadoop Engineer Lead

Environment: Windows 7/Linux, SharePoint 2014, hadoop 2.0, Hive, Pig, Map Reduce, Sqoop, Zookeeper, TFS, VS 2015, Putty, MySQL, Cloudera, Agile, Teradata, shell scripting, Java,Scala, sentry

Responsibilities:

Worked on a live60 nodes Hadoop clusterrunningCDH5.8
Worked with highlyunstructured and semi structured data of 1900 TBin size (270 GB with replication factor of 3)
Extracted the data from Teradata/RDMS into HDFS usingSqoop (Version 1.4.6)
Created and workedSqoop (version 1.4.6)jobs with incremental loadto populate Hive External tables
Involved in setting up Amazon Redshift and import data from RBMS to redshift
Worked on spark streaming with kafka.Data ingestion with spark SQL and creating spark data frames.
Extensive experience in writingPig (version 0.15)scripts to transform raw datafrom several big data sources into forming baseline big data.
DevelopedHive(version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
Very good understanding ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Strong Knowledge on Multi Clustered environment and setting up ClouderaHadoopEco-System. Experience in installation, configuration and management ofHadoop Clusters.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Worked on managing big data/ hadoop logs
Develop shell scripts for oozie workflow.
Worked on BI reporting tool Talend for generating reports.
Integrate Talend with Hadoop for processing big data jobs.
Good knowledge of Solr, Kafka
Shared responsibility for administration of Hadoop, Hive and Pig.
Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, HBase on Hadoop clusters.
Provide detailed reporting of work as required by project status reports

Confidential, Indianapolis, IN

Big Data/ Hadoop Developer

Environment: Windows 7/Linux/Unix, SharePoint 2013, Hadoop 2.0, Eclipse, Hive, Pig, Map Reduce, Sqoop, Hbase, Zookeeper, HPALM, Putty, Oracle/Teradata, Cloudera, Agile

Responsibilities:

Design and Implementation of ETL process in Big Data/ Hadoop Eco-systems.
Hands on experience on Cloudera and migrating big data from Oracle with Sqoop (Version 1.4.3).
Very good experience with both MapReduce 1 (Job Tracker/Task Tracker) and MapReduce 2 (YARN)
Worked and import data from Teradata/oracle with SQOOP (version 1.4.3)
Implementation of de-duplication process to avoid duplicates in daily load.
Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
Worked with the admin team in designing and upgrading CDH 3 to CDH 4
DevelopedUDFsin Java as and when necessary to use in PIG and HIVE queries
Experience in usingSequence files, RCFile, AVRO file formats
DevelopedOozieworkflow for scheduling and orchestrating the ETL process
Process and analyze ETL/ big data jobs in Talend
Successfully tested and generate reports in Talend.
Successfully integrated Hive tables with MySQL database.
Working in UNIX based environment for data operations
Experience working on NoSQL databases including Hbase.
Experience in deployment of code changes using team city build.
Involved in handling code fixes during production release.
Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
Provide detailed reporting of work as required by project status reports.

Confidential, Folsom, CA

Big Data/ Hadoop Developer

Environment: Windows 7, SharePoint 2013, Hadoop 1.0, Eclipse, Pig, Hive, Flume, Sqoop, HBase, Putty, HPALM, WinSCP, Agile, MySQL

Responsibilities:

Responsible for building scalable distributed data solutions usingHadoop and migrate legacy Retail applications ETL toHadoop
Accessed information through mobile networks and satellites from the equipment.
Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
Hands on creating different applications in social networking websites and obtaining access data from them.
Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
Implemented some business logics by writing UDFs in Java and used various UDFs from
Piggybanks and other sources to get some results from the data.
Worked with cloud administrations like Amazon web services (AWS)
Used Oozie workflow engine to run multiple Hive and Pig jobs.
Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
Worked with application teams to install operating system,Hadoop updates, patches, version upgrades as required.
Continuously monitored and managed theHadoopCluster using Cloudera Manager.

Confidential, Warren, MI

ETL/ Informatica/Java Developer

Environment: Windows XP/NT, Informatics 6.2, UNIX,Java, Oracle, SQL, PL/SQL

Responsibilities:

Participated in documenting the existing operational systems.
Involved in the requirements gathering for the warehouse. Presented the requirements and a design document to the client.
Created ETLjobs to load data from staging area into data warehouse.
Analyzed the requirements and framed the business logic for the ETLprocess.
Involved in the ETL design and its documentation
Experience in manufacturing Unit and their activities like Planning, Purchase and Sale activities.
Involved in creation of use cases for purchase and sale.
Designed and developed complex aggregate, join, lookup transformation rules (business rules) to generate consolidated (fact/summary) data using InformaticaPower center 6.0.
Designed and developed mappings using Source qualifier, Aggregator, Joiner, Lookup, Sequence generator, stored procedure, Expression, Filter and Rank transformations
Development of pre-session, post-session routines and batch execution routines using InformaticaServer to run Informaticasessions
Evaluated the level of granularity
Evaluated slowly changing dimension tables and its impact to the overall Data Warehouse including changes to Source-Target mapping, transformation process, database, etc.
Collect and link metadata from diverse sources, including relational databases Oracle, XML and flat files.
Created, optimized, reviewed, and executed Teradata SQL test queries to validate transformation rules used in source to target mappings/source views, and to verify data in target tables
Extensive experience with PL/SQL in designing, developing functions, procedures, triggers and packages.
Developed Informaticamappings, re-usable Sessions and Mapplets for data load to data warehouse.
Designed and developed Informaticamappings and workflows; Identify and Remove Bottlenecks in order to improve the performance of mappings and workflows and used Debugger to test the mappings and fix the bugs

Confidential

Software Engineer

Environment: Informatica 6.1, PL/SQL, MS Access, Oracle, Windows, Unix, Java 1.8, Restful web services, SOA, Spring, Ajax, JavaScript, CSS 3, JSP, Servlet, JSTL, JPA, Hibernate, Junit, MySQL, Tomcat, JSON

Responsibilities:

Extensively worked with the data modelers to implement logical and physical data modeling to create an enterprise level data warehousing.
Created and Modified T-SQL stored procedures for data retrieval from MS SQL SERVER database.
Automated mappings to run using UNIX shell scripts, which included Pre and Post-session jobs and extracted data from Transaction System into Staging Area.
Extensively used Informatica Power Center 6.1/6.2 to extract data from various sources and load in to staging database.
Extensively worked with Informatica Tools - Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Repository manager, Workflow Manager, Workflow Monitor, Repository server and Informatica server to load data from flat files, legacy data.
Created mappings using the transformations like Source qualifier, Aggregator, Expression, Lookup, Router, Filter, Rank, Sequence Generator, Update Strategy, Joiner and stored procedure transformations.
Designed the mappings between sources (external files and databases) to operational staging targets.
Involved in data cleansing, mapping transformations and loading activities.
Developed Informaticamappings and also tuned them for Optimum performance, Dependencies and Batch Design.
Involved in the process design documentation of the Data Warehouse Dimensional Upgrades. Extensively used Informatica for loading the historical data from various tables for different departments.

Confidential

Software Engineer

Environment: ETL, Oracle8i,Windows 95/NT, Oracle, Tomcat, UNIX, XML, Java, Servlets, JSP, Oracle, Windows NT and UNIX, Akka, Tomcat

Responsibilities:

Extensively worked on Informaticatools such as Designer (Source Analyzer, Warehouse Designer, Mapping Designer, Transformations), Workflow Manager, and Workflow Monitor
Developing mapping using needed transformations in tool according to technical specification
Resolving any performance issue with transformations in Informatica and mapping with the help of technical specifications
Created and Run the UNIX scripts for all Pre-Post session ETLjobs.
Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
Developed number of Complex InformaticaMappings and Reusable Transformations
Extensively used various transformations like Aggregator, Expression, connected and unconnected Lookup's and update strategy transformations to load data into target.
Experience in debugging and performance tuning of targets, sources, mappings and sessions.

We provide IT Staff Augmentation Services!

Hadoop Engineer Lead Resume

NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship