Hadoop/big Data Architect Resume
San Jose, CA
SUMMARY:
- Overall 12 + years of experience in Enterprise Application and product development.
- Completed CloudU certification.
- Experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like HDFS, MapReduce, YARN, Hive, Pig, HBase, Flume, Sqoop Spark, Storm, Scala, Kafka, Oozie, Zookeeper, Azure, MongoDB, and Cassandra.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Horton Works, Cloudera Manager, Apache Ambari, in HDP.
- Hands on experience in Installing, Configuring and Troubleshooting using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm& Impala.
- Configuring, Managing permissions for the users in Hue.
- Expertise in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2.
- Extensive experience in writing MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Built Automated Test Frameworks to test Data Values, Data Integrity, Source - to-Target record counts, and field mappings between Transactional, Analytical Data Warehouse and Reporting systems.
- Involved in reading writing data from hive through spark SQL & data frames
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, Kafka.
- Used Machine Learning Algorithm to train data for sentiment analysis on client data.
- Worked on DataLake processing for the manufacturing data with Spark Scala and storing in Hive tables for further analysis with Tableau
- Proficient in programming using Spark & Scala. Experience of programming with various components of the framework, such as Impala. Should be able to code in Python as well.
- Proficient in developing data transformation and other analytical applications in Spark, Spark-SQL using Scala programming language.
- Profound experience in creating real time data streaming solutions using Apache Spark Streaming, Kafka.
- Experienced in developing Spark scripts for data analysis in both Python and Scala.
- Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
- Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Oracle.
- Strong exposure to NoSQL database like HBase, MongoDB, Cassandra.
- Good knowledge in Kafka and Messaging systems like RabbitMQ.
- Getting files from Amazon AWS s3 bucket to production cluster
- Used GitHub for repository all source code maintenance.
- Granting and revoking users privileges to the both dev and prod clusters
- Every day running health tests on the services running on Cloudera
- Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
- Written Python script, by using Elastic Search, AWS S3 services pull the data into HDFS as well as Keysight database.
- Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
- Involved in Data analytics and reporting with tableau.
- Sending daily status of the services running on the cluster to the scrum master.
- Involved in ETL Batch process for my Keysight client.
TECHNICAL SKILLS:
Big Data: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, ElasticSearch
Technical Skills: Core Java, C,C++, Django Python, C#,VC++, Go
Methodologies: Agile, UML, Design Patterns
Programming Language: Scala, Core Java
Database: Oracle, MySQL, Casandra, HBase
Application Server: Apache Tomcat
Web Tools: HTML, Java Script, XPath, XQuery
Tools: SQL developer, Toad, GIT Hub
IDE: STS, Eclipse
Operating System: Windows, Unix/Linux
Scripts: Bash, Python, Perl
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Hadoop/Big Data Architect
Responsibilities:
- Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Worked on analyzing Hadoop cluster and different Big Data Components including Hive, Spark, Kafka, Elastic Search, Oozie and SQOOP. Importing and exporting data into HDFS and Hive using SQOOP.
- Worked on DataLake processing for the manufacturing data with Spark Scala and storing in Hive tables for further analysis with Tableau.
- Developed Hive (version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Experienced in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
- Extracted the data from Teradata/RDMS into HDFS using Sqoop (Version 1.4.6)
- Data ingestion with spark SQL and creating spark data frames.
Environment: Spark, Hive, Scala, JDK, UNIX Shell Scripting, IntelliJ, Toad, GitHub.
Confidential, New York, NY
Hadoop/Big Data Lead Developer
Responsibilities:
- Worked on a live 100 nodes Hadoop cluster running Apache/ MapR
- Worked with highly unstructured and semi structured data of 30 TB in size.
- Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Worked on analyzing Hadoop cluster and different Big Data Components including Hive, Spark, Kafka, Elastic Search, Oozie and SQOOP. Importing and exporting data into HDFS and Hive using SQOOP.
- Worked on DataLake processing for the manufacturing data with Spark Scala and storing in Hive tables for further analysis with Tableau.
- Developed Hive (version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Experienced in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
- Extracted the data from Teradata/RDMS into HDFS using Sqoop (Version 1.4.6)
- Data ingestion with spark SQL and creating spark data frames.
- Strong Knowledge on Multi Clustered environment and setting up Cloudera Hadoop Eco-System. Experience in installation, configuration and management of Hadoop Clusters.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked on managing big data/ hadoop logs
- Develop shell scripts for oozie workflow.
- Worked on BI reporting tool Tableau for generating reports.
- Integrate Talend with Hadoop for processing big data jobs.
- Good knowledge of Kafka
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, and HBase on Hadoop clusters.
- Using AWS S3, pull the cloud data into Keysight database.
- Written python script using elastic search (AWS) to find the customer based data in a better way.
- Used core java for batch processing, processed the incremental item data files.
- Used ETL tools for various data files processing for Keysight Applications.
- Using Azure for external vendor data processing
Environment: Core Java, multi-node installation, Map Reduce, Spark, Kafka Hive, Impala, Zookeeper, Oozie, Java, Python scripting, Scala, JDK, UNIX Shell Scripting, AWS,MySQL, Eclipse, Toad, GitHub.
Confidential
Hadoop/Big Data Developer
Responsibilities:
- Worked on a live 20 nodes Hadoop cluster running Apache/ CDH5.8/ Horton Works
- Worked with highly unstructured and semi structured data of 10 TB in size
- Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Worked on analyzing Hadoop cluster and different Big Data Components including Hive, Spark, Kafka, Elastic Search, Oozie and SQOOP. Importing and exporting data into HDFS and Hive using SQOOP.
- Worked on DataLake processing for the manufacturing data with Spark Scala and storing in Hive tables for further analysis with Tableau.
- Extensive experience in writing Pig (version 0.15) scripts to transform raw data from several big data sources into forming baseline big data.
- Developed Hive (version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Experienced in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
- Extracted the data from Teradata/RDMS into HDFS using Sqoop (Version 1.4.6)
- Created and worked Sqoop (version 1.4.6) jobs with incremental load to populate Hive External tables
- Data ingestion with spark SQL and creating spark data frames.
- Strong Knowledge on Multi Clustered environment and setting up Cloudera Hadoop Eco-System. Experience in installation, configuration and management of Hadoop Clusters.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked on managing big data/ hadoop logs
- Develop shell scripts for oozie workflow.
- Worked on BI reporting tool Tableau for generating reports.
- Integrate Talend with Hadoop for processing big data jobs.
- Good knowledge of Kafka
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, and HBase on Hadoop clusters.
- Using AWS S3, pull the cloud data into Keysight database.
- Written python script using elastic search (AWS) to find the customer based data in a better way.
- Used core java for batch processing, processed the incremental item data files.
- Used ETL tools for various data files processing for Keysight Applications.
- Using Azure for external vendor data processing
Environment: Core Java, multi-node installation, Azure, Map Reduce, Spark, Kafka Hive, Impala, Zookeeper, Oozie, Java, Python scripting, Scala, JDK, UNIX Shell Scripting, AWS,MySQL, Eclipse, Toad, GitHub.
Confidential
Hadoop / Big Data Developer
Responsibilities:
- Developed high integrity programs used in systems where predictable and highly reliable operation is essential using Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Developed analytical component using Scala, Spark and Spark Stream.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
- Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Assisted the Hadoop team with developing Map-Reduce scripts in Python.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Build/Tune/Maintain Hive QL and Pig Scripts for loading, filtering and storing the data and user reporting. Involved in creating Hive tables, loading data, and writing Hive queries.
- Handled importing of data from various data sources, performed transformations using Spark and loaded data into Cassandra.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
- Used Scala to write code for all Spark use cases.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Written perl script by using elastic search - AWS to pull customer data from eBay cloud.
- Developed DataLake for customer data, Using Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Used GitHub for source code version control process.
Environment: Core Java, multi-node installation, Map Reduce, Spark, Kafka Hive, Impala, flume, Storm, Zookeeper, Oozie, Java, Python scripting, Scala, JDK, UNIX Shell Scripting, AWS,TestNG, MySQL, Eclipse, Toad, GitHub.
Confidential
Sr. C++/Core Java/Script Developer
Responsibilities:
- Coordinate with onsite team for requirement gathering.
- Involved in understanding the business needs and formulated the low level design for Hard to borrow.
- Created various use cases using massive public data sets
- And various performance tests for verifying the efficacy of codegen in various modes.
- Researched, collated, edited content related to Perl and tools. Shared knowledge with peers.
- Design the Sate-Machine framework for Hard to borrow using Singleton Pattern in C++.
- Creating the Shared Objects .so files for Hard to borrow business logic, So that any client can invoke it.
- Developed Perl jobs to insert, update the Sungurd data into the Sybase database in E*TRADE format.
- Carried out Unit testing on the developed code via C++ tester application, to test for all possible positive and negative scenarios, so as to maintain the quality and performance of the code.
- Assigning logical units of work to each developer at offshore (A team of 7 Members).
Environment: C, C++, Core Java, Perl, Python, Linux, MySQL, Sybase.
Confidential
C++/C# developer
Responsibilities:
- Coding/enhancement of programs.
- Resolution and monitoring of problem tickets.
- Supporting on call activities.
- Status reporting, Work plan and documentation
Environment: C, C++.C#, XML, Windows.
Confidential
C++/VC++ Developer
Responsibilities:
- Coding/enhancement of programs.
- Resolution and monitoring of problem tickets.
- Supporting on call activities.
- Status reporting, Work plan and documentation
Environment: C++, C#, VC++, XML, MySQL, Oracle.