Sr. Hadoop/spark Developer Resume
Dallas, TX
PROFESSIONAL SUMMARY:
- Strong Knowledge of Software Development Life Cycle (SDLC) and the Role of Hadoop/Spark developer in different developing methodologies like Agile and Waterfall.
- Expertise in all components of Hadoop Ecosystem - Hive, Pig, HBase, Impala, Sqoop, HUE, Flume, Zookeeper, Oozie and Apache Spark.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experienced in integrating Kafka with Spark streaming for high speed data processing.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV format
- Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Expertise in developing PIG Latin Scripts and Hive Query Language for data Analytics.
- Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Used Oozie job scheduler to schedule MapReduce jobs and automate the job flows and Implemented cluster coordination services using Zookeeper.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Experienced in working with Amazon Web Services ( AWS ) using EC2 for computing and S3 as storage mechanism.
- Knowledge in creating different visualizations using Bars, Lines and Pies, Maps, Scatter plots, Histograms, Highlight tables and application of local and global filters according to the end user requirement in Tableau.
- Knowledge in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Experience in working with different relational databases like MySQL and Oracle.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Proficient in software documentation and technical report writing.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming Languages: C, C++, Java, Python, SCALA
Scripting Languages: Shell Scripting, Java Scripting
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, SQL, PL/SQL, Teradata
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Build Tools: Ant, Maven, sbt
Development IDEs: NetBeans, Eclipse IDE
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Cloud: AWS
Version Control Tools: SVN, Git, GitHub
Packages: Microsoft Office, putty, MS Visual Studio
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Sr. Hadoop/Spark Developer
Responsibilities:
- Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data into HDFS for analysis.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developing design documents considering all possible approaches and identifying best of them.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Imported results into visualization BI tool Tableau to create dashboards.
- Worked in Agile Methodology and used JIRA to maintain the stories about project.
- Involved in gathering the requirements, designing, development and testing.
Environment: Hive, Flume, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Scala, Cassandra.
Confidential, MNSr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured Hadoop clusters and eco-system.
- Responsible for building scalable distributed data solutions using Hadoop.
- Work with business stakeholders to understand requirements / business use cases.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Developed spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Loading data from large data files into Hive tables and HBase NoSQL databases.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Zookeeper, Hbase, Scala, Spark, Linux Shell Scripting, Tableau .
Confidential, Peoria, ILHadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive.
- Involved in loading data from LINUX file system, servers, Java webservices using Kafka Producers, partitions.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented Kafka consumers to get data from Kafka partitions and move into HDFS.
- Implemented MapReduce programs into Spark transformations using Spark and Scala. Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Importing and exporting data into HDFS and HIVE using Sqoop.
- Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Implement Partitioning, Dynamic Partitions, Buckets in HIVE. Responsible to manage data coming from different sources.
- Monitor the running MapReduce programs on the cluster. Responsible for loading data from UNIX file systems to HDFS.
- Install and configure Hive and make use of Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Implement the workflows using Apache Oozie framework to automate tasks.
- Manage IT and business stakeholders, conduct assessment interviews, solution review sessions.
- Experience in agile Programming and accomplishing the tasks to meet deadlines.
- Review the code developed and suggests any issues with respect to customer data.
- Use SQL queries and other tools to perform data analysis and profiling.
Environment: Hadoop, HDFS, MapReduce, HBase, Hive, Kafka, Flume, Cloudera, Eclipse (Juno), Java, My SQL and Oracle 10g.
Confidential, Cleveland, OHHadoop Developer
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Developed Map Reduce programs in Java for parsing the raw data and populating staging tables.
- Created Map Reduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Extensively worked on combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Implemented Hive Generic UDF's to implemented business logic around custom data types
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.
- Implemented Partitions, Buckets in Hive for optimization.
- Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Experience in troubleshooting in MapReduce jobs by reviewing log files.
Environment: Hadoop, MapReduce, Hive QL, MySQL, HBase, HDFS, HIVE, Impala, PIG, Sqoop, Oozie, Flume, Cloudera, Zookeeper, Hue Editor, Eclipse, Oracle 11g, PL/SQL, SQL*PLUS, UNIX, Tableau.
ConfidentialJAVA/J2EE Developer
Responsibilities:
- Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.
- Extensively used java collections framework (list, set, map and queues).
- Designed various applications using multi-threading concepts, mostly used to perform time consuming tasks in the background.
- Proficient in developing static web applications with HTML5, CSS3, XHTML.
- Completely involved in back-end development (Business Layer) of the application using Java/J2EE technologies.
- Worked in all the modules of the application which involved front-end presentation logic developed using Spring MVC, JSP, JSTL, Servlets and data access layer using Hibernate framework.
- Use of Joins, Triggers, Stored Procedures and Functions to interact with backend database using SQL.
- Experience in developing Middle-tier components in distributed transaction management system using Java. Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Responsible for periodic generation of reports.
- Performed Unit testing of the application using JUNIT.
- Developed ANT script for compiling and deployment.
- Review the changes on the weekly basis and ensure the deliverables to be quality.
- Used Eclipse IDE to deploy application on TOMCAT server.
- Used SVN as centralized version control system and log4j for logging.
- Documented the events, workflows, code changes, bugs fixes related to enhancing new features and correcting code defects.
Environment: Java, J2EE, UML, Struts, HTML, Log4J, CSS, Java Script, Oracle 9i, SQL*Plus, PL/SQL, MS Access, UNIX Shell Scripting.
ConfidentialJava Developer
Responsibilities:
- Full life cycle experience including requirements gathering, business analysis, System architecture, software architecture, data design, coding, testing used waterfall methodology.
- Responsible for coding and implementing MVC2 with JSP, Struts, Hibernate.
- Developed JSP custom tags using Tag Libraries.
- Involved in Admin and Student module in all activities.
- Involved in coding for JSP pages, Form Beans and Action Classes in Struts
- Involved in Database Connectivity through JDBC.
- Involved in Writing DAO.
- Involved in Integration Testing and in User Acceptance Test (UAT)
- Involved in Production support for the Application.
Environment: Java, Java Servlets, Struts, Hibernate, JSP, JavaScript, HTML, XHTML, CSS, Log4j, Tomcat5.5, MySql5.0.