Sr. Hadoop Developer Resume
Milpitas, CA
SUMMARY:
- Over 8+ years of overall experience as software developer in design, development, deploying and supporting large scale distributed systems.
- Over 3 years of extensive experience as Hadoop Developer and Big Data Analyst.
- Primary technical skills in HDFS, MapReduce, YARN, Pig, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper.
- Have good experience in extracting and generating statistical analysis using Business Intelligence tool QlikView for better analysis of data.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyse large data sets efficiently.
- Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, MapReduce, Flume, Oozie. Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Hands - on experience developing Teradata PL/SQL Procedures and Functions and SQL tuning of medium/large databases.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Hands-on experience developing Teradata PL/SQL Procedures and Functions and SQL tuning of medium/large databases.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Ganglia, NoSQL databases such as HBase, Cassandra, BigTable, administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Experience in design, development and testing of Distributed, Internet/Intranet/E-Commerce, Client/Server and Database applications mainly using technologies Java, Servlets, JDBC, JSP, Struts, Hibernate, Spring, JavaScript on WebLogic, Apache Tomcat Web/Application Servers and with Oracle and SQLServer Databases on Unix, windows NT platforms.
- Extensive experience with Databases such as SQL Server 2005-08, Oracle11G.
- Experience in writing SQL queries, Stored Procedures, Triggers, Cursors and Packages.
- Has worked on ETL tools like INFORMATICA for Best Buy project.
- Experience in handling XML files related technologies like Informatica XML parser & XML writer.
- Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and DataStage.
- Good Knowledge and experience in performance tuning in the live systems for ETL/ELT jobs that are built on Informatica.
- Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka and spark streaming.
- Experience in writing database objects like Stored Procedures, Triggers for Oracle, MS SQL
- Good knowledge in PL/SQL, hands on experience in writing medium level SQL queries
- Good knowledge in Impala, Spark/scala, Shark, Storm, Ganglia.
- Expertise in preparing the test cases, documenting and performing unit testing and Integration.
- In-depth understanding of DataStructures and Algorithms and Optimization.
- Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
- Fast learner with good interpersonal skills, having strong analytical and communication skills and interested in problem solving and troubleshooting.
- Self-motivated, excellent team player, with positive attitude and adhere to strict deadlines.
TECHNICAL SKILLS:
Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX
Languages and Technologies: C, C++, Java, SQL, PL-SQL
Scripting Languages: Shell scripting, Python
Databases: Oracle, MySQL, PostgreSQL, Teradata
IDE: Eclipse and NetBeans
Application Servers: Apache Tomcat server, Apache HTTP webserver
Versioning Systems: Git, SVN
Hadoop Ecosystem: Hadoop MapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie.
Apache Spark: Spark, Spark SQL, Spark Streaming. SCALA
Cluster Mgmt.& Monitoring: Cloudera Manager, HortonworksAmbari, Ganglia and Nagios.
Security: Kerberos.
No-SQL: Cassandra,HBase,DataStax
PROFESSIONAL EXPERIENCE:
Confidential, Milpitas, CA
Sr. Hadoop Developer
- Used Sqoop to transfer data between RDBMS and HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Implemented complex map reduce programs to perform map side joins using distributed cache.
- Worked on migrating MapReduce python programs into Spark transformations using Spark and Scala.
- Consulted with business partners and made recommendations to improve the effectiveness of Big Data systems, descriptive analytics systems, and prescriptive analytics systems.
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in MapReduce.
- Performed the complete analysis and provided the design, prepared the ETL mapping specs for the source and target.
- Designed and deployed a Spark cluster and different Big Data analytic tools including Spark, Kafka streaming, AWS and HBase with Cloudera Distribution.
- Migrated ETL jobs to Pig scripts to do Transformations, joins and some pre-aggregations before storing the data onto HDFS.
- Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load) and Hive QL.
- Used ELK (Elasticsearch, Logstash and Kibana) for name search pattern for a customer.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Scala, Spark-SQL, Data Frame, and Pair RDD's .
- Converted existing SQL queries into Hive QL queries.
- Reading the log files using Elastic search Logstash and alerting users on the issue and also saving the alert details to MongoDB for analyzations.
- Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
- Responsible for preparing the technical requirements for Informatica ELT mapping developments using Informatica Cloud Services/Salesforce/Windows platform from
- BDD logical models.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Design and tested the data ingestion to handle data from multiple sources into the Enterprise Data lake.
- Loaded and analyzed logs generated by different web applications using Flume.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Pig to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into HBase.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Installed and configured Pig and wrote Pig Latin scripts.
- Responsible for creating Hive tables based on business requirements.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Implementation, Testing and Maintenance) using waterfall and agile methodologies.
- Involved in NoSQL database design, integration and implementation.
- Loaded data into NoSQL database HBase.
Environment: Amazon web services (AWS EC2), Sqoop, MapReduce, Hive, Oozie, Pig Latin, HBase
Confidential, Milwaukee, WI
Hadoop Developer
- Involved in building a multi-node Hadoop Cluster
- Used Big Data, design and build portfolio of event-driven and long-short trading algorithms to exploit various cycle and seasonal trading opportunities using TradeStation.
- Configured Hive Metastore to use Oracle database to establish multiple user connections to hive tables.
- Performed the complete analysis and provided the design, prepared the ETL mapping specs for the source and target.
- Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
- Imported data into HDFS using Sqoop.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
- Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR). creating the new subscriptions in Azure and giving the necessary privileges as needed.
- Designed and developed Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive and Pig.
- Experience in retrieving data from databases like MYSQL and Oracle into HDFS using Sqoop and ingesting them into HBase.
- Used PigLatin to analyze datasets and perform transformation according to business requirements.
- Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
- Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
- Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR).
- Perform data profiling, data analysis in Teradata for data mart design and issues in Data Warehouse.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift.
- Implemented fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Performed procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Migrated the code into QA (Testing) and supported QA team and UAT (User).
- Configured Nagios for receiving alerts on critical failures in the cluster by integrating with custom Shell Scripts.
- Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
- Worked on migrating MapReduce, Python programs into Spark transformations using Spark and Scala.
- Used Spark API/Scalaover Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Scala, Data Frame, and Pair RDD's.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.
Confidential, Sunnyvale, VA
Big Data Engineer
- Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop extensively to ingest data from various source systems into HDFS.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Experience to analyze and reviewing Hadoop log files, autosys log file and system log file.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
- Imported structured data, tables into Hbase.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Automation script to monitor HDFS and HBase through Cron jobs.
- Hive was used to produce results quickly based on the report that was requested.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Integrated multiple sources data (SQL Server, DB2) into Hadoop cluster and analyzed data by Hive-Hbase integration.
- Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
- Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
- Tested the performance of the data sets on various NoSQL databases.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Spark, Scala, Sqoop, Kerberos, Java Eclipse, SQL Server, Oozie, Zookeeper, Shell Scripting.
Confidential
Hadoop Developer
- Involved in analysis of end user requirements and business rules based on given documentation and worked closely with tech leads and Business analysts in understanding the current system.
- Analyzed the business requirements and involved in writing Test Plans and Test Cases.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Experienced in managing and reviewing the Hadoop log files.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
- Wrote SQL queries to perform Data Validation and Data Integrity testing.
- Created SQL*Loader scripts to load legacy data into Oracle staging tables.
- Developed UNIX shell scripts to run the batch jobs.
Environment: Hadoop, HDFS, MapReduce, Sqoop, Agile, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse, Cassandra.
Confidential
Java Developer
- Involved in design phase meetings for Business Analysis and Requirements gathering.
- Worked with business functional lead to review and finalize requirements and data profiling analysis.
- Worked on entry level Java programming assignments.
- Responsible for gathering the requirements, designing and developing the applications.
- Worked on UML diagrams for the project use case.
- Worked with Java String manipulations, to parse CSV data for applications.
- Connected Java applications to Java database to read, write data.
- Developed static and dynamic Web Pages using JSP, HTML and CSS.
- Worked on JavaScript for data validation on client side.
- Involved in structuring Wiki and Forums for product documentation
Environment: Java, Servlets, Spring, JSP, JavaScript, HTML, PHP, CSS, Eclipse