Big Data Developer Resume
Redmond, WA
SUMMARY
- Over 8 years of experience in Information Technology with a strong background in Database development, Database Management, Deployments, release management, Implementing High Availability, managing very large environments, Application development and Data warehousing
- Excellent understanding of Hadoop Architecture and Daemons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, YARN, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
- Created custom Solr Query segments to optimize ideal search matching.
- Used Solr Search & MongoDB for querying and storing data.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala.
- Analysed the Cassandra/SQL scripts and designed the solution to implement using Scala.
- Extracted and updated the data into MONGOD using MONGO import and export command line utility interface.
- Developed Collections in Mongo DB and performed aggregations on the collections.
- Familiar with EC2, S3, ELB, Cloud watch, SNS, Elastic IP's and managing security groups, IAM on AWS.
- Hands on experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera and Horton works distributions.
- In - depth Knowledge of Data Structures, Design and Analysis of Algorithms.
- Good understanding of Data Mining and Machine Learning techniques.
- Hands on experience in various Hadoop distributions IBM Big Insights, Cloudera, Horton works and MapR.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
- Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
- Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
- Hands on experience in developing Map Reduce programs using Apache Hadoop for analysing the Big Data.
- Experience in System programming (Ruby, Python, Bash)
- Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
- Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.
- Performed System Integration test to ensure quality of the system. Familiarity working with TCP/IP, IPv4, IPv6 protocols in an environment which provides multithreading, multitenancy and High Availability support at Network layer
- Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
TECHNICAL SKILLS
Big Data Hadoop: HDFS, Map Reduce V2, PIG, HIVE, HBase, Oozie, Spark, Kafka, Storm, Zookeeper, Flume
Java Technologies: Core Java, J2EE Servlets, JSP, JDBS, JNDI, JAVA Beans, Hibernate, Java Script, JQuery, JDBC, Applets, Swings, StrutsAWS tools: EC2, VPC, Route 53, S3, IAM, Cloud Watch, Cloud Trail, Glacier, Elastic Search
Programming Languages: C, C++, R, Java, Python, Scala, UNIX Shell Scripting
Databases: Oracle 11g, MYSQL, DB 2, MY-SQL Server, Confidential SQL server, MS Access
NoSQL Databases: MongoDB, HBase, Apache Cassandra, Amazon Dynamo DB, Neo4j
Configuration Tools: Chef, Puppet, Salt Stack and Ansible
IDE tools: Eclipse, NetBeans, PyCharm, IntelliJ, Android Studio
Virtualization Technologies: VMware ESX server, Confidential Hyper-V Server
Operating Systems: Linux Red Hat, Linux CentOS, Ubuntu, Unix, Windows, AIX
PROFESSIONAL EXPERIENCE
Confidential - Redmond, WA
Big Data Developer
Responsibilities:
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, Hive, Hbase and Sqoop.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Experience in implementing Spark RDD's in Scala.
- Implemented different machine learning techniques using Weka machine learning library.
- Developed Spark program to analyse reports using Machine Learning models
- Good exposure in development with HTML, Bootstrap, Scala
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Strong experience in working with ELASTIC MAPREDUCE (EMR) and setting up environments on Amazon AWS EC2 instances.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS
- Experienced in implementing static and dynamic partitioning in hive.
- Experience in customizing map reduce framework at different levels like input formats, data types
- Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
- Developed custom writable MapReduce JAVA programs to load web server logs into HBase using Flume.
- Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
- Implemented API tool to handle streaming data using Flume.
- Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
- Involved in migrating hive queries and UDF's in hive to Spark SQL.
- Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
- Created Oozie workflow engine to run multiple Hive jobs.
- Created customized BI tool for manager team that perform Query analytics using Hive.
- Involved in data migration from Oracle database to Mongo DB
- Involved in migrating tables from RDBMS into Hive tables using Sqoop and later generate particular visualizations using Tableau.
- Used Elastic search for name pattern matching customizing to the requirement.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy etc… in Hive tables.
- Creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling
- Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
Environment: Hadoop, Cloudera (CDH 4), HDFS, Hive, HBase, Flume, Sqoop, Pig, Kafka Java, Eclipse, Teradata, Tableau, Talend, MongoDB, Ubuntu, UNIX, and Maven.
Confidential - Bellevue, WA
Hadoop Developer
Responsibilities:
- Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS.
- Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
- Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and Map Reduce.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Experience with Map Reduce coding.
- Solved small file problem using Sequence files processing in Map Reduce.
- Written various Hive and Pig scripts.
- Experience in Upgrading cluster, CDH and HDP Cluster.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Used flume, Sqoop, Hadoop, spark and Oozie for building data pipeline
- Created HBase tables to store variable data formats coming from different portfolios.
- Experience in upgrading Hadoop cluster HBase/zookeeper from CDH3 to CDH4.
- Performed real time analytics on HBase using Java API and Rest API.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
- Setup flume for different sources to bring the log messages from outside to HDFS.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Worked on compression mechanisms to optimize MapReduce Jobs.
- Real time experience with analytics and BI.
- Wrote Python scripts to parse XML documents and load the data in database..
- Experienced with working on Avro Data files using Avro Serialization system.
- Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Unit tested and tuned SQLs and ETL Code for better performance.
- Monitored the performance and identified performance bottlenecks in ETL code.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Zookeeper, YARN, Oozie, Java, Eclipse
Confidential - Redmond, WA
Hadoop Developer
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Develop and maintain our Big Data pipeline that transfers and process data using Apache Spark.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked in DevOps model, Continuous Integration and Continuous Deployment (CICD), automated deployments using Jenkins and Ansible
- Installed and Configured HBase, Hive, Pig, Sqoop, Kafka, Oozie, Ansible, TLS and Flume on the HDP
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Responsible for managing the Chef client nodes and upload the cookbooks to chef-server from Workstation
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Maintaining the Elastic search cluster and Logstash nodes to process around 5TB of Data Daily from various sources like Kafka, kubernetes, etc.
- Created and Maintaining Real Time Dash boards in Kibana for (Unique viewers, Unique Devices, Click Events, Clint Errors, and Average Bitrate etc.
- Design, build and manage the ELK (Elastic Search, Logstash, Kibana) cluster for
- Centralized logging and search functionalities for the App.
- Responsible to designing and deploying new ELK clusters (Elastic search, Logstash, Kibana, beats, Kafka, Zookeeper etc.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Strong knowledge and hands-on experience in Talend.
- Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.
- Design Batch ingestion components using Sqoop scripts, data integration and processing components using shell scripts, pig scripts, hive scripts.
- Proposed an automated system using Shell script to Sqoop the job.
- Worked in Agile development approach
- Worked with Flume from Server to HDFS
- Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
- Developed a data pipeline for data processing using Spark SQL API.
- Created the estimates and defined the sprint stages.
- Developed a strategy for Full load and incremental load using Sqoop.
- Mainly worked on Hive queries to categorize data of different claims.
- Integrated the hive warehouse with HBase
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase and Hive)
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Defining workflow and scheduling all the processes involved using Oozie
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Environment: Hortonworks HDP 2.3, MapReduce V2 yarn, HDFS, Hive, Pig, Java, SQL,, Sqoop, Oracle, MySQL, Tableau, Talend, Elastic search, Oozie, Spark Core, Spark SQL, Spark Streaming, Kafka, Flume, Eclipse
Confidential - Richardson, TX
Hadoop Developer
Responsibilities:
- Involves migrating the existing SQL CODE to Data Lake and sending the extracted reports to the consumers
- Worked on Data mapping to understand the source to target mapping rules.
- Analysed the requirements and framed the business logic and implemented it using Talend.
- Worked on the design, development and testing of Talend mappings.
- Created ETL job infrastructure using Talend Open Studio.
- Performed data analysis and data profiling using SQL on various extracts.
- Using SQL to query Databases Performing various validations and mapping activities
- Effectively interacted with Business Analyst and Data Modellers and defined Mapping documents and Design process for various Sources and Targets.
- Designed and implemented SQL queries for data analysis and data validation and compare data in test and production environment.
- Created reports of analysed data using Apache Hue and Hive Browser and generated graphs for data analytics.
- Created tables, loading with data and writing Hive queries which will run internally in map.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Created partitioned tables in Hive used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.
- Documented requirements in Jira as a backlog of user stories for team
- Led grooming sessions, sprint planning, retrocessions and daily stand-ups with the teams during the absence of scrum master.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies (Jira)
Environment: Apache Hive, SQL, TALEND, Sqoop, Apache Hue, Excel, pivot tables, Talend open studio etc
Confidential
Software Developer
Responsibilities:
- Involved in requirements gathering and analysis from the existing system.
- Worked with Agile Software Development.
- Designed and developed business components using Spring AOP, Spring IOC, and Spring Batch.
- Implemented DAO using Hibernate, AOP and service layer using spring, MVC design.
- Developed Java Server components using spring, Spring MVC, Hibernate, Web Services technologies.
- Using Java1.7 with generics, for loop, static import, annotations etc, J2EE, Servlet, JSP, JDBC, Spring3.1 RC1, Hibernate, Web services (Axis, JAX-WS, JAXP, JAXB) JavaScript Framework (DOJO, JQuery, AJAX, XML, Schema).
- Used Hibernate as persistence framework for DAO layer to access the database.
- Worked with the JavaScript framework Angular JS.
- Designed and developed Restful APIs for different modules in the project as per the requirement.
- Developed TCP interface using Java Socket APIs to communicate with Bit torrent server
- Designed and developed an environment, where two Micro services of TCP/IP Networking stacks had to co-exist together making applications transparent of existence of two stacks
- Developed JSP pages using Custom tags and Tiles framework.
- Developed the User Interface Screens for presentation logic using JSP and HTML.
- Have Used Spring IOC to inject the services and their dependencies in dependency injection mechanism.
- Developed SQL queries to interact with SQL Server database and also involved in writing PL/SQL code for procedures and functions.
- Developed the persistence layer (DAL) and the presentation layer.
- Created Angular JS controllers, directives, models for different modules in the frontend.
- Used MAVEN for build framework and Jenkins for continuous build system.
- Developed GUI using Front end technologies JSP, JSTL, AJAX, HTML, CSS and Java Script.
- Developed a code for Web services using XML, SOAP and used SOAPUI tool for testing the services proficient in testing Web Pages functionalities and raising defects.
- Involved in writing Spring Configuration XML, file that contains declarations and business classes are wired-up to the frontend managed beans using Spring IOC pattern.
- Configured and deployed the application using Tomcat and Web Logic.
- Used Design patterns such as Business Object (BO), Service locator, Session façade, Model View Controller, DAO and DTO.
- Used Log4J to print info, warning and error data on to the logs.
- Involved in writing the Junit test cases as part of unit testing.
- Prepared auto deployment scripts for Web Logic in UNIX environment.
- Used Java Messaging artifacts using JMS for sending out automated notification emails to respective users of the application.
Environment: Java, J2EE, Spring Core, Spring Data, Spring MVC, Spring AOP, Spring Batch, Spring Scheduler, Restful Web Services, SOAP Web Services, Hibernate, Eclipse IDE, Angular JS, JSP, JSTL, HTML5, CSS, JavaScript, Web Logic, Tomcat, XML, XSD, Unix, Linux, UML, Oracle, Maven, SVN, SOA, Design patterns, JMS, JUNIT, log4J, WSDL, JSON, JNDI.