Hadoop-spark Developer Resume
San Jose, CA
SUMMARY:
- Having 8+ years of experience in software design, development, implementation, and support of various applications like Big Data (Hadoop) and Java technologies.
- 3.6 years of experience with Hadoop Ecosystem including Spark, Scala, HDFS, Map Reduce, Hive, Pig, Storm, Kafka, YARN, HBase, Oozie, Zookeeper, Flume, Sqoop
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Excellent ability to use analytical tools to mine data, Predictive analysis, evaluating the underlying patterns and implement complex algorithms for data analysis.
- 1.5 Year Hands On experience on SPARK, Spark Streaming, Spark MLlib, SCALA,
- Creating the Data Frames handle in SPARK with Scala
- Hands On experience on developing UDF, DATA Frames and SQL Queries in SPARK SQL
- Developed PIG Latin scripts and SPARKSQL scripts for handling data formation.
- Hands on experience on Real Time data tools like Kafka and Storm.
- Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS
- Creating the UDF’s in Java and Register them in PIG and HIVE
- Good understanding on Spark architecture and its components.
- Experience in writing Pig Latin Scripts.
- Experience in writing UDF’s in Java for PIG and Hive.
- Efficient in writing the Map Reduce programs for analyzing structured and unstructured data.
- Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop job
- Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab. Cloud Infrastructure:
- Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
- Experience with Azure Components like Azure Sql Database and Data Factory.
- Experienced in working with different file formats - Avro, Parquet, RC and ORC.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experienced and skilled Agile Developer with a strong record of excellent teamwork and
- Successful coding.
TECHNICAL SKILLS:
Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and Horton works Data Platform (HDP)
Hadoop Ecosystem: HDFS, Hive, Pig, Sqoop, Oozie, Flume, Spark, Zookeeper, Map-Reduce, Spark-SQL, Spark Streaming and Spark MLib.
NoSQL Databases: HBase, Cassandra
Programming: C, C++, Python, Java, SCALA, PL/SQL, SBT, MAVEN
RDBMS: ORACLE, MySQL, SQL Server
Web Development: HTML, JSP, Servlets, JavaScript, CSS, XML
IDE: Eclipse4.x, NetBeans, Microsoft Visual Studio
Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8 and Z/OS (Main Frames)
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, Horton Works Ambari and Hadoop Security Tools
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Hadoop-Spark Developer
Responsibilities:
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-Sql, Data Frames and Pair RDD’s.
- Worked on Cluster of size 135 nodes.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
- Created RDD’s, Data Frames and Datasets.
- Created Hive Tables, loaded data from Teradata using Sqoop.
- Worked on tuning of back-end stored procedures using TOAD.
- Good experience with Talend Open Studio for designing ETL Jobs for Processing of data.
- Used ORC, Parquet file formats for storing the data.
- Used java code for Sql Queries and also code to retrieve the Sql Queries through Text File.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Use python for writing script to move the data cluster to cluster.
- Log4j framework has been used for logging debug, info & error data.
- Created Hive External and Managed tables.
- Designed and Maintained Airflow configs workflows to manage the flow of jobs in the cluster
- Loaded the Spark RDD and do in memory data Computation to generate the Output response.
- Sometimes variable needs to share across the Nodes. So, in such cases we used shared variables: Broadcast Variable, Accumulator.
Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql, Spark Data Frames
Confidential, Horsham, PA
Hadoop Developer
Responsibilities:
- Involved in Design and Development of technical specifications.
- Written shell scripts to pull the data from Tumbleweed server to cornerstone staging area.
- Data conversion from EBICIDC to ASCII format.
- Written Sqoop commands to pull the data from Teradata Source.
- Written Pig scripts to preprocess the data before loading to cornerstone.
- Optimization of Hive Scripts.
- Registration of feeds metadata in MYSQL tables.
- Written shell scripts and scheduled our jobs through UNIX crons.
- Written Job Work flows using Spring Batch.
- Worked on Project deployment from Gold cluster to platinum Cluster.
- Provide support for PRD Support Team.
- Closely worked with Hadoop security team and infrastructure team to implement security.
- Implemented authentication and authorization service using Kerberos authentication Protocol.
- Designed and implemented streaming data on UI with Scala.js
- Hands-on experience with systems-building languages such as Scala, Java
- Programs for Validation/Normalizing/Enriching and REST API to Develop UI Based on manual QA Validation. Used Spark SQL, Scala to running QA based SQL queries.
- Creating RDD's and Pair RDD's for Spark Programming.
- Implement Joins, Grouping and Aggregations for the Pair RDD's.
- Save the result in HIVE for the downstream to access the data.
- Use Data frames for data transformations.
Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql.
Confidential, Kansas, MO
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in installing, configuring and managing Hadoop Ecosystem components like Spark, Hive, Pig, Sqoop, Kafka and Flume.
- Involved in installing Hadoop and Spark Cluster in Amazon Web Server.
- Work Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Responsible for Data Ingestion like Flume and Kafka.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
- Developed Spark Programs for Batch and Real Time Processing.
- Developed Spark Streaming applications for Real Time Processing.
- Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created internal and external tables with properly defined static and dynamic partitions for efficiency.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
- Implemented Hive custom UDF’s to achieve comprehensive data analysis.
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
- Implemented authentication and authorization service using Kerberos authentication Protocol
- Used Pig to develop ad-hoc queries.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
Environment: Hadoop, Spark, Spark Streaming, Spark MLlib, Scala Hive, Pig, Hcatalog, MapReduce, Oozie, Sqoop, Flume and Kafka, Kerberos.
Confidential, Grand Rapids, Michigan
Hadoop Developer
Responsibilities:
- Loading files to HDFS and writing hive queries to process required data.
- Loading data to hive tables and writing queries to process.
- Involved in loading data from LINUX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Experience in managing and reviewing Hadoop log files.
- Worked on Hive for exposing data for further analysis and for generating transforming Files from different analytical formats to text files.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Worked on configuring multiple MapReduce Pipelines, for the new Hadoop Cluster.
- Performance tuned and optimized Hadoop clusters to achieve high performance.
- Written Hive queries for data analysis to meet the business requirements.
- Monitored System health and logs and respond accordingly to any warning or failure Conditions.
- Responsible to manage the test data coming from different sources.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Weekly meetings with technical collaborators and active participation in code review Sessions with senior and junior developers.
- Created and maintained Technical documentation for launching Hadoop Clusters and for Executing Hive queries and Pig Scripts
- Implemented schedulers on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
- Extensive hands on experience in Hadoop file system commands for file handling Operations.
Environment: Hadoop, Map Reduce, HDFS, Hive 0.10.1, Java, Hadoop distribution of Cloudera, Pig 0.11.1, HBase 0.94.1, Linux, Sqoop 1.4.4, Kafka, Zookeeper 3.4.3, Oozie 3.3.0, Tableau.
Confidential
Java Developer
Responsibilities:
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class
- Model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
- Configured the project on Web Sphere 6.1 application servers
- Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1,
- Web Services, SOAP, WSDL
- Communicated with other Health Care info by using Web Services with the help of SOAP,
- WSDL JAX-RPC
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Doing functional and technical reviews
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the
- Production support.
- Created test plan documents for all back-end database modules.
- Implemented the project in Linux environment.
Environment: JDK 1.5, JSP, Web Sphere, JDBC, EJB2.0, XML, DOM, SAX, XSLT, CSS, HTML, JNDI, Web
