Hadoop Engineer Lead Resume
NY
SUMMARY
- Over 11+ years of diversified experience in Software Design & Development. Experience as Hadoop developer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.
TECHNICAL SKILLS
Scripting Language: Java, Scala, Python
Hadoop Ecosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase,Impala,oozie,HUE, MongoDB.
ETL Tools: Informatica Power Center 6.1/ 7.1/ 9.1
Operating Systems: MS - DOS, Windows95/98/NT/XP/7, Linux, Unix
Web Technologies: JSP, JDBC, CSS
Databases: Oracle, My SQL
Application /Web Server: Apache Tomcat 4.0, Web Logic, TFS
Functional Testing Tools: Quick Test Pro, Selenium, Load Runner, Quality Center, HPALM, JIRA
PROFESSIONAL EXPERIENCE
Confidential, NY
Hadoop Engineer Lead
Environment: Windows 7/Linux, Hadoop 2.0, Yarn, SharePoint 2014, Hadoop, Amazon S3, Hive, Pig, Map Reduce, Impala, Sqoop, Flume, MongoDB, Zookeeper, Kafka, HBASE, Putty, MySQL, Cloudera, Agile, shell scripting, Java,Scala
Responsibilities:
- Setting up multi cluster environment running on CDH 5.11.
- Worked with highlyunstructured and semi structured data of 2048 TBin size.
- Setting up both prod and dev nodes. 120 total nodes. HDFS Storage has 80 nodes including 12 disk /node having 3.7 TB memory.
- Involved in data loading from external sources with Impala queries to target tables
- Involved in setting up MongoDB cluster. Set up user roles, security.
- Worked on developing JVM applications with Scala and Java.
- Extensive and good understanding of Hortonworks (HDP 2.6)
- Created oozie workflow for data ingestion. Workflow runs every week.
- Involved in sentiment analysis process with spark streaming. Carry out sentiment analysis on Facebook, twitter, Instagram by data ingress through Flume, processing the jobs in SparkJava/scalaand store them in Hbase and MongoDB.
- Involved in reading writing data from hive through spark SQL & data frames.
- Setting Monitoring, alerts and events on the CDH with batch Linux/shell scripting
- Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, kafka.
- Secure cluster with Kerberos.
- Used Machine Learning library(Spark Mlib) to train data for sentiment analysis on social media.
- Worked on processing customer feedback data from press ganey surveys with Spark Scala and storing in Hive tables for further analysis with Tableau
- Involved in importing data from RDBMS with Sqoop jobs. Troubleshoot if any of the jobs fails to finish.
- Involved in complete ETL process by doing data loading onto target tables from external vendors data
- Involve in implementing Github for source versioning, code migration repo.
- Getting files from Amazon AWS s3 bucket to prod cluster and store them in amazon redshift.
- Involved in giving access to Cloudera manager, Hue, oozie to internal and external users by creating SR request
- Granting and revoking users privileges to the both dev and prod clusters
- Every day running health tests on the services running on Cloudera
- Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
- Setting up jobs in Crontab as well as creating oozie workflows, coordination’s.
- Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
- Involved in Data analytics and reporting with tableau.
- Sending daily status of the services running on the cluster to the scrum master.
- Involved in creating documentation regarding all jobs running on cluster
Confidential, Ridgefield, NJ
Big Data/Hadoop Engineer Lead
Environment: Windows 7/Linux, SharePoint 2014, hadoop 2.0, Hive, Pig, Map Reduce, Sqoop, Zookeeper, TFS, VS 2015, Putty, MySQL, Cloudera, Agile, Teradata, shell scripting, Java,Scala, sentry
Responsibilities:
- Worked on a live60 nodes Hadoop clusterrunningCDH5.8
- Worked with highlyunstructured and semi structured data of 1900 TBin size (270 GB with replication factor of 3)
- Extracted the data from Teradata/RDMS into HDFS usingSqoop (Version 1.4.6)
- Created and workedSqoop (version 1.4.6)jobs with incremental loadto populate Hive External tables
- Involved in setting up Amazon Redshift and import data from RBMS to redshift
- Worked on spark streaming with kafka.Data ingestion with spark SQL and creating spark data frames.
- Extensive experience in writingPig (version 0.15)scripts to transform raw datafrom several big data sources into forming baseline big data.
- DevelopedHive(version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
- Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Strong Knowledge on Multi Clustered environment and setting up ClouderaHadoopEco-System. Experience in installation, configuration and management ofHadoop Clusters.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked on managing big data/ hadoop logs
- Develop shell scripts for oozie workflow.
- Worked on BI reporting tool Talend for generating reports.
- Integrate Talend with Hadoop for processing big data jobs.
- Good knowledge of Solr, Kafka
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, HBase on Hadoop clusters.
- Provide detailed reporting of work as required by project status reports
Confidential, Indianapolis, IN
Big Data/ Hadoop Developer
Environment: Windows 7/Linux/Unix, SharePoint 2013, Hadoop 2.0, Eclipse, Hive, Pig, Map Reduce, Sqoop, Hbase, Zookeeper, HPALM, Putty, Oracle/Teradata, Cloudera, Agile
Responsibilities:
- Design and Implementation of ETL process in Big Data/ Hadoop Eco-systems.
- Hands on experience on Cloudera and migrating big data from Oracle with Sqoop (Version 1.4.3).
- Very good experience with both MapReduce 1 (Job Tracker/Task Tracker) and MapReduce 2 (YARN)
- Worked and import data from Teradata/oracle with SQOOP (version 1.4.3)
- Implementation of de-duplication process to avoid duplicates in daily load.
- Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
- Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4
- DevelopedUDFsin Java as and when necessary to use in PIG and HIVE queries
- Experience in usingSequence files, RCFile, AVRO file formats
- DevelopedOozieworkflow for scheduling and orchestrating the ETL process
- Process and analyze ETL/ big data jobs in Talend
- Successfully tested and generate reports in Talend.
- Successfully integrated Hive tables with MySQL database.
- Working in UNIX based environment for data operations
- Experience working on NoSQL databases including Hbase.
- Experience in deployment of code changes using team city build.
- Involved in handling code fixes during production release.
- Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
- Provide detailed reporting of work as required by project status reports.
Confidential, Folsom, CA
Big Data/ Hadoop Developer
Environment: Windows 7, SharePoint 2013, Hadoop 1.0, Eclipse, Pig, Hive, Flume, Sqoop, HBase, Putty, HPALM, WinSCP, Agile, MySQL
Responsibilities:
- Responsible for building scalable distributed data solutions usingHadoop and migrate legacy Retail applications ETL toHadoop
- Accessed information through mobile networks and satellites from the equipment.
- Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Hands on creating different applications in social networking websites and obtaining access data from them.
- Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
- Implemented some business logics by writing UDFs in Java and used various UDFs from
- Piggybanks and other sources to get some results from the data.
- Worked with cloud administrations like Amazon web services (AWS)
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
- Worked with application teams to install operating system,Hadoop updates, patches, version upgrades as required.
- Continuously monitored and managed theHadoopCluster using Cloudera Manager.
Confidential, Warren, MI
ETL/ Informatica/Java Developer
Environment: Windows XP/NT, Informatics 6.2, UNIX,Java, Oracle, SQL, PL/SQL
Responsibilities:
- Participated in documenting the existing operational systems.
- Involved in the requirements gathering for the warehouse. Presented the requirements and a design document to the client.
- Created ETLjobs to load data from staging area into data warehouse.
- Analyzed the requirements and framed the business logic for the ETLprocess.
- Involved in the ETL design and its documentation
- Experience in manufacturing Unit and their activities like Planning, Purchase and Sale activities.
- Involved in creation of use cases for purchase and sale.
- Designed and developed complex aggregate, join, lookup transformation rules (business rules) to generate consolidated (fact/summary) data using InformaticaPower center 6.0.
- Designed and developed mappings using Source qualifier, Aggregator, Joiner, Lookup, Sequence generator, stored procedure, Expression, Filter and Rank transformations
- Development of pre-session, post-session routines and batch execution routines using InformaticaServer to run Informaticasessions
- Evaluated the level of granularity
- Evaluated slowly changing dimension tables and its impact to the overall Data Warehouse including changes to Source-Target mapping, transformation process, database, etc.
- Collect and link metadata from diverse sources, including relational databases Oracle, XML and flat files.
- Created, optimized, reviewed, and executed Teradata SQL test queries to validate transformation rules used in source to target mappings/source views, and to verify data in target tables
- Extensive experience with PL/SQL in designing, developing functions, procedures, triggers and packages.
- Developed Informaticamappings, re-usable Sessions and Mapplets for data load to data warehouse.
- Designed and developed Informaticamappings and workflows; Identify and Remove Bottlenecks in order to improve the performance of mappings and workflows and used Debugger to test the mappings and fix the bugs
Confidential
Software Engineer
Environment: Informatica 6.1, PL/SQL, MS Access, Oracle, Windows, Unix, Java 1.8, Restful web services, SOA, Spring, Ajax, JavaScript, CSS 3, JSP, Servlet, JSTL, JPA, Hibernate, Junit, MySQL, Tomcat, JSON
Responsibilities:
- Extensively worked with the data modelers to implement logical and physical data modeling to create an enterprise level data warehousing.
- Created and Modified T-SQL stored procedures for data retrieval from MS SQL SERVER database.
- Automated mappings to run using UNIX shell scripts, which included Pre and Post-session jobs and extracted data from Transaction System into Staging Area.
- Extensively used Informatica Power Center 6.1/6.2 to extract data from various sources and load in to staging database.
- Extensively worked with Informatica Tools - Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Repository manager, Workflow Manager, Workflow Monitor, Repository server and Informatica server to load data from flat files, legacy data.
- Created mappings using the transformations like Source qualifier, Aggregator, Expression, Lookup, Router, Filter, Rank, Sequence Generator, Update Strategy, Joiner and stored procedure transformations.
- Designed the mappings between sources (external files and databases) to operational staging targets.
- Involved in data cleansing, mapping transformations and loading activities.
- Developed Informaticamappings and also tuned them for Optimum performance, Dependencies and Batch Design.
- Involved in the process design documentation of the Data Warehouse Dimensional Upgrades. Extensively used Informatica for loading the historical data from various tables for different departments.
Confidential
Software Engineer
Environment: ETL, Oracle8i,Windows 95/NT, Oracle, Tomcat, UNIX, XML, Java, Servlets, JSP, Oracle, Windows NT and UNIX, Akka, Tomcat
Responsibilities:
- Extensively worked on Informaticatools such as Designer (Source Analyzer, Warehouse Designer, Mapping Designer, Transformations), Workflow Manager, and Workflow Monitor
- Developing mapping using needed transformations in tool according to technical specification
- Resolving any performance issue with transformations in Informatica and mapping with the help of technical specifications
- Created and Run the UNIX scripts for all Pre-Post session ETLjobs.
- Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
- Developed number of Complex InformaticaMappings and Reusable Transformations
- Extensively used various transformations like Aggregator, Expression, connected and unconnected Lookup's and update strategy transformations to load data into target.
- Experience in debugging and performance tuning of targets, sources, mappings and sessions.