Big Data Developer Resume
Phoenix, AZ
SUMMARY:
- 7+ years of experience in Information Technology with 4 years of Hadoop/ Big Data development and 3 years of Java J2EE technologies.
- Comprehensive working experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera (CDH4/CDH5), MapR and has working knowledge on Amazon’s EMR.
- Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers etc.
- In depth understanding of Apache Spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages and task.
- Good Experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, to HDFS and performed transformations on it using Hive, Pig and Spark.
- Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Extensively worked on Spark using Scala on cluster for computational analytics, performed advanced analytical operations by making use of Spark with Hive and SQL/Oracle.
- Experience in performing transformations and actions on Spark RDDS using Spark Core.
- Good working experience in using Broadcast variables, Accumulator variables and RDD caching in Spark.
- Experience in troubleshooting Cluster jobs using Spark UI.
- Having good knowledge on Hadoop data management components like HDFS and YARN.
- Experience in managing and reviewing Hadoop log files.
- Designing and creating Hive external tables using shared meta - store instead of derby with partitioning, dynamic partitioning and buckets.
- Experience in performing Extract-Transform-Load(ETL) operations on data pipelines using Pig.
- Simplified Complex tasks involving interrelated data transformations and encoded data flow sequences using Pig Latin.
- Expertise in creating Managed/External tables, Views, Partitions, Buckets and analytical functions in HIve using HQL.
- Worked on GUI Based Hive Interaction tools like Hue for querying the data.
- Regularly tune performance of Hive and Pig queries to improve data processing and retrieving.
- Experience in importing and exporting data using Sqoop between HDFS and Relational Database Systems.
- Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL.
- Strong knowledge in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
- Strong Experience in NoSQL databases like HBase, Cassandra.
- Proficient in Shell Scripting.Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat
- Experience with best practices of Web services development and Integration (both REST and SOAP).
- Used Project Management services like JIRA for handling service requests and tracking issues.
- Proficient in Java, J2EE, JDBC, Collection Framework, JSON, XML, REST, SOAP Web services.
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
TECHNICAL SKILLS:
Data Access Tools: HDFS, YARN, Hive, Pig, HBase, Solr, Impala, Spark Core, Spark SQL, Spark Streaming
Data Management: HDFS, YARN
Data Workflow: Sqoop, Flume, Kafka
Data Operation: Zookeeper, Oozie
Data Security: Ranger, Knox
Big Data Distributions: Hortonworks, Cloudera
Cloud Technologies: AWS (Amazon Web Services) EC2, S3, DynamoDB, SNS, SQS, EMR, KINESIS
Programming and Scripting Languages: Java, Scala, Pig Latin, HQL, SQL, Shell Scripting, HTML, CSS, JavaScript
IDE/Build Tools: Eclipse, Intellij
Java/J2EE Technologies: XML, Junit, JDBC, AJAX, JSON, JSP
Operating Systems: Linux, Windows, Kali Linux
SDLC: Agile/SCRUM, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Big Data Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, HBase
Database and Spark.
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Worked with Log4j framework for logging debug, info & error data.
- Involved in creating Hivetables, loading and analyzing data using Hive scripts.
- Implemented Static partitions, Dynamic partitions and Buckets in Hive.
- Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
- Created Hive Generic UDF s to process business logic that varies based on policy.
- Converted existing MapReducejobs into Sparktransformations and actions using Spark RDDs, Data frames and SparkSQL APIs.
- Responsible for design & deployment of Spark SQL scripts and Scala shell commands based on functional specifications.
- Managed and reviewed hadoop log files to identify issues when a job fails and finding out the root cause.
- Wrote new spark jobs in Scala to analyze the data of the customer account and payment history.
- Experienced in handling large datasets using Sparkin Memory capabilities, broadcasts variables, effective and efficient Joins,transformations and other capabilities.
- Interacted with different system groups for analysis of systems.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Performed unit testing using JUnit.
- Involved in sprint planning, code review and daily standup meetings to discuss the progress of the application
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Created HBase tables and column families to store the user event data.
- Developed interactive shell scripts for scheduling various data cleansing and data loading processes.
- Work to tight deadlines and provide regular progress updates against agreed milestone.
- Designed, documented operational problems by following standards and procedures using JIRA.
Environment: Hadoop, Hive, Hbase, Sqoop, Oozie,Zookeeper,MapR, Spark, Scala, shell scripting, apache kafka
Confidential, Alpharetta, GA
Hadoop/Spark Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
- Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive.
- Implemented Publish/Subscribe modelusing Apache Kafka for real-time transactions to load into HBase.
- Analyzed the data by performing Hive queries and Pig scripts to study customer behavior.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Involved in adding huge volumes of data in rows and columns to store data in HBase.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Used Oozie workflow engine to create the workflows and automate the MapReduce, Hive and Pig jobs.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Experience in NoSQL database such as HBase, MongoDB.
- Collaborated with the network, database, and BI teams to ensure data quality and availability.
Environment: Hadoop, MapReduce, Pig, Hive, Hbase, Sqoop, Oozie, Spark, Solr,Sqoop, shell scripting, apache kafka
Confidential, Long Beach, CA
Data Engineer
Responsibilities:
- Involved in gathering business requirements, design, development and testing.
- Extensive experience in writing Pig (version 0.10) scripts to transform raw data from several data sources into forming baseline data.
- Built complex data flows involving multiple inputs, transforms and outputs using Pig.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Imported data from MySQL server and other relational databases to Apache Hadoop with the help of Apache Sqoop.
- Written MapReduce jobs in java to process the log data.
- Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Tested and reported defects in an Agile Methodology perspective.
- Analyzed stored data using Impala.
- Generated various marketing reports using Tableau with Hadoop as a source for data.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Provided daily code contribution, worked in a test-driven development.
Environment: Hadoop, HDFS, Hive, MapReduce, Spark, Scala, Kafka, HBase,Impala, Oozie, Java, Linux, Cloudera.
Confidential
JAVA/J2EE Developer
Responsibilities:
- Implemented the application using Agile methodology. Involved in daily scrum and sprint planning meetings.
- Actively involved in analysis, detail design, development, bug fixing and enhancement.
- Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.
- Developed Micro services using RESTful services to provide all the CRUD capabilities.
- Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.
- Developed the Application Module using several design patterns like Singleton, DAO, DTO, and MVC.
- Involved in writing JSPs, Javascript and Servlets to generate dynamic web pages and web content.
- Used JBoss application server deployment of applications.
- Developed communication among SOA services.
- Involved in creation of both service and client code for JAX-WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.
- Designed the user interface of the application using HTML5, CSS3, JavaScript, Angular JS, jQuery and AJAX.
- Designed Node.js application components through Express.
- Implemented AJAX functionality to speed up web application.
Environment: Java,J2EE, Spring MVC, Hibernate, SOAP, REST, JAXB, JAX-RPC, AngularJs, JQuery, AJAX, JSON, JavaScript, Bootstrap, XSL, XML, Struts, DB2, JUnit, Log4j, NetBeans IDE.
Confidential
Jr. Software Engineer
Responsibilities:
- Individually worked on all the stages of a Software Development Life Cycle (SDLC).
- Used JavaScript code, HTML and CSS style declarations to enrich websites.
- Integrating Web services and working with data in different servers.
- Wrote several Action Classes and Action Forms to capture user input and created different web pages using JSTL, JSP, HTML, Custom Tags and Struts Tags.
- Developed the front end for the site based on (MVC) design pattern Using Struts framework.
- Used standard data access technologies like JDBC and ORM tools like Hibernate
- Developed complex PL/SQL queries to access data.
- Understanding the requirements from business users and end users.
- Experience creating UML class and sequence diagram.
- Converted XML into JAVA objects using JAXB API.
- Developed UI components using JSP and JavaScript.
- Wrote test cases using JUnit testing framework and configured applications on WebLogic Server
- Coordinated across multiple development teams for quick resolution to blocking issues.
Environment: Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat, PL/SQL, JIRA, SVN, JUNIT, MS ACCESS, Microsoft Excel.