Sr. Hadoop/spark Developer Resume
Chicago, IL
PROFESSIONAL EXPERIENCE:
- Extensive IT experience of over 9 years in Analysis, Design, Development, Implementation, Maintenance and Support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
- Around 5 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, Map Reduce, HIVE, PIG, YARN, APACHE SPARK, FLUME, KAFKA, OOZIE, SQOOP, ZOOKEEPER and NoSQL Databases like HBase, Cassandra.
- Worked extensively on Hadoop (Gen - 1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Resource Manager (YARN).
- Experience in working with Amazon EMR, Cloudera (CDH4/CDH5) and Horton Works Hadoop Distributions.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Extensively used Apache Sqoop for efficiently importing and exporting data from HDFS to Relational Database Systems and from RDBMS to HDFS.
- Worked on data load from various sources i.e., Oracle, MySQL, DB2, MS SQL Server, Cassandra, Hadoop using Sqoop and Python Script.
- Experience in developing data pipeline using Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis. Worked on Impala for Massive parallel processing of Hive queries.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the HiveQL queries.
- Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data including Apache Spark, Spark SQL and Spark Streaming.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Worked with Spark engine to process large scale data and experience to create Spark RDD and developing Spark Streaming jobs by using RDDs and leverage Spark-Shell.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Hands on experience in Apache Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
- I have been experienced with SPARK SREAMING API to ingest data into SPARK ENGINE from KAFKA.
- Worked on real time data integration using Kafka - Storm data pipeline, Spark streaming and HBase.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark.
- Good working experience on different file formats (CSV, Sequence files, XML, JSON, PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
- Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle, DB2, SQL SERVER and MySQL.
- Expertise in job scheduling and monitoring tools like Oozie and ZooKeeper and experience in designing Oozie workflows for cleaning data and storing into Hive tables for quick analysis.
- Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances, AZURE, EMR and S3.
- Installed and configured JENKINS FOR AUTOMATING Deployments and providing automation solution.
- Developed build and deployment scripts using ANT and MAVEN as build tools in JENKINS to move from one environment to other environments
- Extensive experience in ETL Data Ingestion, In-Stream data processing, Batch Analytics and Data Persistence Strategy. Worked extensively with Dimensional Modeling, Data Migration, Data Cleansing, Data Transformation, and ETL Processes features for Data Warehouse System.
- Experience with creating the TABLEAU dashboards with relational and multi-dimensional databases including Oracle, MySQL and HIVE, gathering and manipulating data from various sources. Having experience in performance tuning, dashboards and TABLEAU reports.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, AJAX, Servlets, JSP, Struts, Web Services, XML, JMS, JSP, UNIX shell scripts, SERVLETS, MS SQL SERVER, SOAP and RESTful web services.
- Extensively development experience in different IDE's like Eclipse, NetBeans.
- Experience in core Java, JDBC and proficient in using Java API's for application development.
- Experience in Deploying web application using application servers WebLogic, Apache Tomcat, WebSphere and JBOSS
- Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
PROFESSIONAL EXPERIENCE:
Big Data Ecosystems: Hadoop HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Kafka, Flume, Spark, Scala, Impala, Oozie, NiFi,Zookeeper, YARN, Talend and Tableau/ QlikView.
Operating Systems: Windows, Linux, UNIX, Ubuntu, Centos
Programming or Scripting Languages: C, C++, Core Java/J2EE, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL, Scala
Hadoop Distributions: Cloudera(CDH4/CDH5), Hortonworks (HDP2.5)
IDE/GUI: Eclipse3.2, IntelliJ, Scala IDE
Build Tools : Jenkins, Maven, ANT
Database: Microsoft SQL Server, MS SQL, Oracle 11g/10g, DB2, MySQL, MS-Access, MS-Access, NoSQL (HBase, Cassandra)
Cloud Computing Tools : Amazon AWS,AZURE
Versioning Tools: JIRA, CVS, SVN, and GitHub.
SDLC Methodologies: Agile, Scrum, Waterfall Model.
PROFESSIONAL EXPERIENCE:
Sr. Hadoop/Spark Developer
Confidential - Chicago, IL
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
- Developed Spark code using Scala, Data Frames and SparkSQL for faster processing of data. Used Spark Data frames, SparkSQL extensively.
- Extracted Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala. Developed Scala and SparkSQL code to extract data from various databases.
- Used Spark SQL to process the huge amount of structured data and implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Experienced on Spark streaming using Apache Kafka for real time data processing. Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Developed data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral data into Hadoop platform for analysis.
- Worked on implementation of a log producer in Scala that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.
- Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HDFS.
- Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
- Involved in the process of data acquisition, data pre-processing and data exploration of project in Spark using Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using Avro, parquet, csv, Json formats.
- Developed Spark jobs using Scala in test environment for faster testing and data processing and used Spark SQL for querying and to access hive tables into spark for faster processing of data. Performed map-side joins on RDD, Spark SQL and Data Frames.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements.
- Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Used Oozie and Zookeeper for workflow scheduling and monitoring. Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyze data.
- Actively participated in software development lifecycle(scope, design, implement, deploy, test) including design and code review
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Apache Spark, Cloudera (CDH 5.8), Scala, Apache Spark, Data Frames, Apache Kafka, SBT build, HUE, Apache Sqoop, Oozie, Zookeeper, Cloudera Manager, HDFS, GitHub, Maven.
Hadoop Developer
Confidential - St. Louis, MO
Responsibilities:
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- While developing applications involved in complete Software Development Life Cycle (SDLC).
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Developed Oozie workflow for scheduling ETL process and Hive Scripts.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Involved in teams to analyze the Anomaly detection and ratings of data.
- Implemented custom input format and record reader to read XML input efficiently using SAX parser.
- Involved in writing queries in SparkSQL using Scala. Worked with SPLUNK to analyze and visualize data.
- Analyze database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Involved in working with Impala for data retrieval process.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
- Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Loaded data from Linux file system to HDFS and vice-versa
- Developed UDF's using both DataFrames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through Sqoop.
- POC for enabling member and suspect search using Solr.
- Worked on ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Used CSVExcelStorage to parse with different delimiters in PIG.
- Installed and monitored Hadoop ecosystems tools on multiple operating systems like
Ubuntu, CentOS.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments. Involved in setting QA environment by implementing pig and Sqoop scripts.
- Got chance working on Apache NiFi like executing Spark script, Sqoop scripts through NiFi, worked on creating scatter and gather pattern in NiFi, ingesting data from Postgres to HDFS, Fetching Hive metadata and storing in HDFS, created a custom NiFi processor for filtering text from Flow files etc.
- Responsible for designing and implementing ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
- Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
- Implemented test scripts to support test driven development and integration.
- Developed multiple MapReduce jobs in java to clean datasets.
- Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Developed UNIX shell scripts for creating the reports from Hive data.
- Manipulate, serialize, model data in multiple forms like JSON, XML. Involved in setting up MapReduce 1 and MapReduce 2.
- Prepared Avro schema files for generating Hive tables and Created Hive tables and loaded the data in to tables and query data using HQL.
- Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
Environment: Hadoop MapReduce 2 (YARN), Nifi, HDFS, PIG, Hive, Flume, Cassandra, Eclipse, Ignite Core Java, Sqoop, Spark, Splunk, Maven, SparkSQl, Cloudera, SolrTalend, Linux shell scripting.
Java/Hadoop Developer
Confidential -Chicago, IL
Responsibilities:
- Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
- Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Designed and developed Java batch programs in Spring Batch.
- Worked on Data Lake architecture to build a reliable, scalable, analytics platform to meet batch, interactive and on-line analytics requirements
- Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
- Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
- Implemented Hive UDF's and did performance tuning for better results
- Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDF’s) to pre-process data for analysis
- Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
- Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE.
- Implemented CRUD operations on HBase data using thrift API to get real time insights.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
- Used various compression codecs to effectively compress the data in HDFS.
- Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF's involving date functions.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
- Installed and configured Pig and wrote Pig Latin scripts.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
- Done the work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Involved in loading data from UNIX file system to HDFS.
- Created java operators to process data using DAG streams and load data to HDFS.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Involved in Develop monitoring and performance metrics for Hadoop clusters.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.
Java Developer
Confidential
Responsibilities:
- Involved in the design, development and deployment of the Application using Java/J2EE Technologies.
- Performed Requirements gathering and analysis and prepared Requirements Specifications document. Provided high level systems design specifying the class diagrams, sequence diagrams and activity diagrams
- Involved in designing user interactive web pages as the front-end part of the web application using various web technologies like HTML, JavaScript, Angular JS, AJAX and implemented CSS for better appearance and feel.
- Integrated AEM to the existing web application and created AEM components using JavaScript, CSS and HTML.
- Programmed Oracle SQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back- end processes to create and update staging tables, log and audit tables, and creating primary keys.
- Provided further Maintenance and support, this involves working with the Client and solving their problems which include major Bug fixing.
- Deployed and tested the application using Tomcat web server.
- Analysis of the specifications provided by the clients.
- Developed JAVABEAN components utilizing AWT and SWING classes.
- Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Used the Core Java concepts to implement the Business Logic.
- Provided on call support based on the priority of the issues.
- Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
Environment: Core Java, Servlets, struts, JSP, XML, XSLT, JavaScript, Apache, Oracle 10g/11g.
Jr Java Developer
Confidential
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project. Developed web pages using HTML, JSP, CSS, and JavaScript.
- Extensively used Core Java, Servlets, JSP and XML.
- Worked with the testing team in creating new test cases and created the use cases for the module before the testing phase.
- Followed Scrum development cycle for streamline processing with iterative and incremental development.
- Implemented the presentation layer with HTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed error logging flow and error handling flow.
- Used Apache log4j Logging framework for logging.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.
Environment: Java 1.6, Core Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.
