Senior Hadoop Engineer Resume
FloridA
SUMMARY
- Over all IT experience 8+ years. 4+ years experience on Hadoop ecosystem.
- Experience with complete software development life cycle (SDLC) and Software Engineering including Requirement gathering, Analyzing, Designing, Implementing, Testing, Support and Maintenance
- Strong in Developing MapReduce Applications, Configuring the Development Environment, Tuning Jobs and Creating MapReduce Workflows
- Experience in performing data enrichment, cleansing, analytics, aggregations using Hive and Pig
- Experience in importing and exporting data from different relational databases like MySQL, Netezza, Oracle into HDFS and Hive using Sqoop
- Extensiveexperienceof handling and converting various data interchange formats (XML,JSON, AVRO,PARQUET) in distributed framework.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Implemented on Hadoop stack and different big data analytic tools, migration from different databases SQL Server2008 R2, Oracle, MYSQL to Hadoop.
- Strong in designingspecifications, functional and technical requirements and process flows
- Experience in dealing with distributed systems, large - scale non-relational data stores, data modeling and multi-terabyte data warehouses
- Hands on experience in application development and database management using the technologies JAVA, RDBMS, Linux/Unix shell scripting and Linux internals
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
- Experience in Designing, Installing, Configuring and Administrating Hadoop Cluster of major Hadoop distributions - Cloudera, Hortonworks&Apache hadoop
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, Spark
- Worked on Multi Clustered environment, setting up Production and QA Hadoop Cluster, Benchmarking the Hadoop Cluster on Amazon AWS, Rackspace and EC2 Cloud environments
- Good handson experience in writing shell scripts in Linux/Unix
- Experience in developing use cases, activity diagrams, sequence diagrams and class diagrams using UML Rational Ross and MS Visio
- Extensively implemented various types of ETL/EDW Migrationprojects using MapReduce, Pig, Hive, Sqoop
- Experience with Apache Storm and streaming real-time CEP solutions.
- Understanding of Cloud architectures and computationally intensive development environments.
- Experience with Docker and Zookeeper, HDFS/Hadoop and cluster management.
- Expertise with Git, Maven and the Agile development process.
- Experience in using Oozie, ControlM and Autosys workflow engine for managing and scheduling Hadoop Jobs
- Worked with big data teams to offload ETL jobs from Teradata and Netezzato Hadoop.
- Experience in importing streaming logs and aggregating the data to HDFS using Flume
- Built ingestion framework using Kafka for streaming logs and aggregating the data into HDFS using Camus
- Knowledge in stream processing technologies like Apache Spark and Storm
- Experience in setting up monitoring tools like Ganglia and Nagiosfor Hadoop and HBase
- Involved in Analysis, Design, Coding and Development of Java custom Interfaces
- Hands on experience on SDLC under agile environment
- Exceptional ability to quickly master new concepts and technologies.
- Proficient in communicating with people at all levels of hierarchy in the organization, very good at post implementation support and a team player with strong analytical & problem solving skills
TECHNICAL SKILLS
Hadoop/ Big Data: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, Oozie, Cassandra, HBase, Zookeeper, Spark, Impala, Kafka, storm
Languages/Simulators: C, C++, Java, Python, SQL, UNIX Shell Scripting, Scala
Operating Systems: Windows Variants, Mac, UNIX, LINUX
Database: MySQL, Oracle
IDE Tools: Eclipse, Net Beans, SQL Developer, MS Visual Studio
Version Control: Git, Svn
Software Tools: MS Office Suite(Word, Excel, Project), MS Visio
Web Technologies: HTML, CSS, XML, PHP
Monitoring Tools: Ganglia, Nagios, Cloudera Manager
NoSQL Databases: Cassandra, HBase
PROFESSIONAL EXPERIENCE
Senior Hadoop Engineer
Confidential, Florida
Responsibilities:
- Developed Simple to complex Map/reduce Jobs using Hive and Pig for performing analytics on data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in hadoopcluster task like commissioning & decommissioning Nodes without any effect for running jobs on the data.
- Wrote Map Reduce jobs to discover trends in data usage by users.
- Involved in running hadoop streaming jobs which helps to process terabytes of text data.
- Introduced Oozie workflow to develop job processing scripts.
- Analyzed large data sets with the help of running Hive queries and Pig scripts.
- Customized parser loader application of Data migration to HBase.
- Loaded cache data into HBase using Sqoop for performing the various queries on the data.
- Created lots of external tables on Hive pointed to HBase tables.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables so that efficiency is maintained.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Performed all Linux operating system, disk management and patch management configurations, on Linux instances in AWS.
- Provided Low-latency computations by caching the working dataset in memory and then performing computations at memory speeds using Spark.
- Implemented in loading and transforming of large data sets of different types of data formats like structured, semi structured and unstructured data.
- For performing easily combine batch, interactive, and streaming jobs in the same application is done using Spark.
- Analyzed large data sets with the help of running Hive queries and Pig scripts.
- Customized parser loader application of Data migration to HBase.
- Loaded cache data into HBase using Sqoop for performing the various queries on the data.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Wrote Hive Queries and UDF's.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Environment: Hadoop,HDFS,MapReduce,Hive, Scala, Pig, Oozie, HBase, Spark, Kafka, Shell Scripting, MySQL, DB2, Oracle
Confidential, San Rafael, CA
Senior hadoop consultant
Responsibilities:
- Developed custom MapReduce programs in java to perform daily transformation of JSON data to text format and store them in HDFS with respect to business requirement
- Designedtheimplemented Hive Tables, Partition strategy, Hcatalog usage and performance tuning of Hive queries
- Developed Pig scripts and UDF’s for data cleansing and denormalization of multiple datasets
- Designed and implemented ETL workflow which includes data ingestion from different databases/Dataware house into HDFS using Sqoop, Transformation and Analysis in Hive/Pig, Preprocessing the raw data using Map reduce
- Wrote Sqooppipeline to efficiently Transfer data from MySQL, DB2, Oracle Exadata, Netezza to Hadoop Environment.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
- Used HDFS API to pull data from HDFS and then SOLR API to push data into solr for indexing
- Assisted Big data infra team on building Hadoop CDH clusters for development and production environment
- Strong working knowledge on spark streaming, RDD's, Spark SQL and Scala
- Processing data through Spark by using Scala and Spark sql
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
- Worked on Collecting and aggregating large amounts of log data using Apache Flume and staging data using HDFS for further analysis
- DevelopedPOC for Kafka Rest API to collect events from front end
- Worked with HadoopBusiness team to determine the use case discovery, Technical specifications and Documentation
- Worked with different file formats and compression techniques in Hadoop like Avro, Sequence, Lzo and Snappy
- Worked on historical data to eliminate the occurrences of duplicatedata in HDFS using hive
- Worked with big data Analyst’s & Data science team in troubleshooting Map reduce job failures and issues with Hive, Pig
- Copied the data from one cluster to other cluster by using DISTCP and automated the procedure using shell scripts
- Automated shell scripts, MapReduce programs, hive jobs and created workflow using Ooziescheduler.
Environment: Hadoop,HDFS,MapReduce,Hive, Pig, Oozie, HBase, Spark, Kafka, Shell Scripting, Java,MySQL, DB2, Oracle, Scala
Confidential, Austin, Texas
Big data consultant
Responsibilities:
- Developed theMapReduce jobs in java and for data cleansing and preprocessing
- Moved data from MS Sql Server &Oracle to HDFS and vice-versa using SQOOP
- Worked on collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques in Hadoop to determine standards
- Developed data pipeline using Pig & Hive from EDW source. These pipelines had customized UDF’sto extend the ETL functionality
- Developed hive queries and UDF’s to analyze/transform the data in HDFS
- Developed hive scripts for implementing control table logic in HDFS
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
- Developed Pig scripts and UDF’s as per the Business logic
- Analyzing/Transforming data in HDFS with Hive and Pig
- Developed Oozie workflows and scheduled through a scheduler on a monthly basis.
- Involved with Big data team in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities like test data creation and Unit testing activities.
Environment: Hadoop, MapReduce MRv1, Hive,Oozie, Pig, Sqoop,Java, Eclipse IDE, Shell Scripting, MS Sql Server, Oracle
Confidential, San Mateo, CA
Hadoop Consultant
Responsibilities:
- Managed and analyzed Hadoop Log Files
- Managed jobs using Fair Scheduler
- Configured Hive Metastore to use MySQL database to establish multiple user connections to hive tables
- Imported data into HDFS using Sqoop
- Experience in retrieving data from databases like MySQL and Oracle into HDFS using Sqoop and ingesting them into HBase
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns
- Worked on shell scripting to automate jobs.
- Used PigLatin to analyze datasets and perform transformation according to business requirements
- Configured Nagios for receiving alerts on critical failures in the cluster by integrating with custom Shell Scripts
- Configured the Ganglia monitoring tool to monitor both Hadoop and system specific metrics
- Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
- Implemented MapReduce programs to perform joins using secondary sorting and distributed cache
- Generated daily and weekly Status Reports to the team manager and participated in weekly status meeting with Team members, Business analysts and Development team
Environment: Apache Hadoop 0.20.203, MapReduce, Hive, Apache Maven, Java, Eclipse IDE, Sqoop, Ganglia, Nagios, Shell Script, Pig, Flume, Maven.
Confidential
System Analyst
Responsibilities:
- Key responsibilities includerequirements gathering, designing and developing the Java applications
- Implemented design patterns and Object Oriented Java design concepts to build the code
- Participated in planning and development of UML diagrams like Use Case Diagrams, Object Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase
- Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code
- Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server
- Performed unit testing, system testing and user acceptance test
- Involved in Analysis, Design, Coding and Development of custom Interfaces
- Gathered requirements from the client for designing the Web Pages
- Gathered specifications for the Library site from different departments and users of the services
- Assisted in proposing suitable UML class diagrams for the project
- Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers
- Designed and implemented the UI using HTML and Java
- Strong knowledge on MVC design pattern
- Worked on database interaction layer for insertions, updating and retrieval operations on data
- Implemented Multi-threading functionality using Java Threading API
Environment: Java,JDBC, HTML, SQL, Oracle, IBM Rational Rose, Eclipse IDE, LDAP