Bigdata Hadoop/data Developer Resume
Arlington, VA
SUMMARY
- Around 8 years of IT experience in a variety of industries, which includes hands on experience in Big Data Hadoop and Java development
- Expertise wif teh tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent noledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark wif Hive and SQL/Oracle.
- Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.
- Good Knowledge in Machine Learning algorithms using Python and its concepts as data - preprocessing, Regression, classification etc and appropriate model selection techniques.
- Good exposure wif Agile software development process.
- Experience in manipulating/analyzing large datasets and finding patterns and insights wifin structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Experienced in writing complex MapReduce programs dat work wif different file formats like Text, Sequence, Xml, parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions wif control flows.
- Experience in migrating teh data using Sqoop from HDFS to Relational Database System and vice-versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Good understanding of Teradata, Zeppelin and SOLR.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation noledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle wif Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
- Experience of using build tools Ant, Maven.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka, HCatalog, Impala, Datameer.
Distributed Platforms: Cloudera, Hortonworks, MapR and Apache
Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts, HL7.
NoSQL Databases: MongoDB, Cassandra, HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Methodology: Agile/Scrum, Rational Unified Process and Waterfall
Monitoring tools: Ganglia, Nagios.
Hadoop/BigData Technologies: HDFS, Map Reduce, spark sql, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase
Version Control: Github, Bitbucket, CVS, SVN, Clear Case, Visual Source Safe
Build & Deployment Tools: Maven, ANT, Hudson, Jenkins
Database: Oracle, MS SQL Server 2005, MySQL, Teradata
PROFESSIONAL EXPERIENCE
Confidential, Arlington, VA
BigData Hadoop/Data Developer
Responsibilities:
- Developing and maintaining a Data Lake containing regulatory data for federal reporting wif big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala, Apache Hive and Cloudera distribution.
- Developing different ETL jobs to extract data from different data sources like Oracle, Microsoft SQL Server, transform teh extracted data using Hive Query Language (HQL) and load it into Hadoop Distributed file system (HDFS).
- Importing data using Sqoop into Hive and Hbase from existing SQL Server
- Fixing data related issues wifin teh Data Lake.
- Involved in importing teh data from different sources into HDFS using sqoop and applying transformations using Hive, spark and tan loading data into Hive tables.
- Developed DF's, Case Classes for teh required input data and performed teh data transformations using Spark-Core.
- Design teh HBase schemes based on teh requirements and HBase data migration and validation
- Implementing new functionality in teh Data Lake using big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala and Apache Hive based on teh requirements provided by teh client.
- Expertise in deployment of Hadoop Yarn, Spark and Storm integration wif Cassandra, ignite and Kafka etc.
- Communicating regularly wif teh business teams along wif teh project manager to ensure dat any gaps between teh client’s requirements and project’s technical requirements are resolved.
- HBase data migration and validation
- Developing Python scripts using Hadoop Distributed File System API’s to generate Curl commands to migrate data and to prepare different environments wifin teh project.
- Monitoring production jobs using Control-M on a daily basis.
- Coordinating teh Production releases wif teh change management team using Remedy tool.
- Communicating TEMPeffectively wif team members and conducting code reviews.
Environment: Hadoop, Data Lake, Python, Hive, Spark, HBase, Sqoop, Cassandra, ETL Informatica, Cloudera, Oracle 10g, Microsoft SQL Server, Control-M, Linux
Confidential, San Antonio, TX
BigData Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Migrated PIG scripts, MR to into Spark Data frames API and Spark SQL to improve performance.
- Used Spark-Streaming APIs to perform transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into Cassandra.
- Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Developed DF's, Case Classes for teh required input data and performed teh data transformations using Spark-Core.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed application using Scala as well
- Expertise in deployment of Hadoop Yarn, Spark and Storm integration wif Cassandra, ignite and Kafka etc.
- Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
- Developed POC using Scala and deployed on teh Yarn cluster, compared teh performance of Spark, wif Hive and SQL.
- Deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Experience in using Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
- Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
- Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on teh hive tables.
- Responsible for importing teh data from different sources like MYSQL databases into HDFS to save it in form of AVRO, JSON file formats.
- Experience in importing data from S3 to HIVE using Sqoop and Kafka.
- Good Experience working wif Amazon AWS for accessing Hadoop cluster components.
- Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.
- Worked on a POC to compare processing time of Impala wif Apache Hive for batch applications to implement teh former in project.
- Developed Hive queries to process teh data and generate teh data cubes for visualizing
- Good experience wif Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Configured Hadoop clusters and coordinated wif BigData Admins for cluster maintenance.
Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.
Confidential, Riverwoods, IL
Hadoop Developer
Responsibilities:
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
- Installed and configured Pig and also written Pig Latin scripts.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Created HBase tables and column families to store teh user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Developed a data pipeline using HBase and Hive to ingest, transform and analyzing customer behavioral data.
- Experience in collecting teh log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
- Handled importing of data from machine logs using Flume.
- Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Configured, monitored, and optimized Flume agent to capture web logs from teh VPN server to be put into Hadoop Data Lake.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Exported teh analyzed data to teh relational databases using Sqoop to further visualize and generate reports for teh BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Wrote Java code to format XML documents; upload them to Solr server for indexing.
- Used wif NoSQL technology (Amazon Dynodb) to gather and track event-based metric.
- Maintenance of all teh services in Hadoop ecosystem using ZOOKEPER.
- Designed and implemented Spark jobs to support distributed data processing.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Help design of scalable Big Data clusters and solutions.
- Followed Agile methodology for teh entire project.
- Involved in review of functional and non-functional requirements.
- Involved in Hadoop cluster task like Adding and Removing Nodes wifout any TEMPeffect to running jobs and data.
- Developed workflows using Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Converting teh existing relational database model to Hadoop ecosystem.
Environment: Hadoop, HDFS, pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Kafka.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Developed and Deployed Hadoop Cluster using Pig, Hive, Hbase, Oozie, Sqoop, Spark, Impala, and Kafka.
- Worked wif Sqoop to import and export data into HDFS, HIVE, and HBASE.
- Implemented MapReduce jobs in Java for data processing.
- Used Sqoop imports to load and transform large sets of data into HDFS from relational databases.
- Experience in writing Map Reduce programs for Data Analysis using Java.
- Involved in integrating Apache Kafka and Apache Storm.
- Performed data transformation and a few pre-aggregations before storing teh data onto HDFS by using Pig.
- Created structured data from a pool of unstructured data using Spark.
- Performed in-memory computing capacity of Spark to perform procedures such as text analysis and processing using Scala.
- Experience working wif Spark Streaming and divided data into different branches for batch processing through teh Spark engine.
- Worked on parallel processing using MapReduce and Spark.
- Created Hive UDFs using Java.
- Loading data from disparate data sets using Flume and Sqoop.
- Worked on job scheduling using Oozie workflow engine.
- Worked on installation, support, and monitoring of Hadoop clusters using Cloudera manager.
- Worked wif teh Cloudera distributions
- Worked wif Apache Kafka to collect, aggregate and move large amounts of data from application servers.
- Worked on integrating Cassandra wif Elastic Search and Hadoop.
- Involved in creating HBASE tables to store data from UNIX and NoSQL.
- Troubleshooting and debussing runtime issues in teh Hadoop ecosystem.
- Involved in integrating algorithms into production system by working wif teh engineering team.
Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Sqoop, Spark, Cloudera, Kafka, Cassandra, HBase.
Confidential, Cambridge, MA
Java/Hadoop Developer
Responsibilities:
- Developed JSP, JSF and Servlets to dynamically generate HTML and display teh data to teh client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged teh ANT Scripts for building teh entire web application.
- Developed web services in Java and Experienced wif SOAP, WSDL and used WSDL to publish teh services to another application.
- Implemented Java Message Services (JMS) using JMS API.
- Involved in managing and reviewing Hadoop log files.
- Installed and configured Hadoop, YARN, Map Reduce, Flume, HDFS, developed multiple Map Reduce jobs in Java for data cleaning.
- Coded Hadoop Map Reduce jobs for energy generation and PS.
- Coded using Servlets, SOAP Client and Apache CXF RestAPI's for delivering teh data from our application to external and internal for communication protocol.
- Worked on Cloudera distribution system for running Hadoop jobs on it.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce, Hive, Pig and Solr, Splunk.
- Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
Environment: Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.
Confidential
Java Developer
Responsibilities:
- Designed and developed teh application using agile methodology.
- Implementation of new module development, new change requirement, fixes teh code. Defect fixing for defects identified in pre-production environments and production environment.
- Wrote technical design document wif class, sequence, and activity diagrams in each use case.
- Developed various reusable halper and utility classes which were used across all modules of teh application.
- Involved in developing XML compilers using XQuery.
- Developed teh Application using Spring MVC Framework by implementing Controller, Service classes.
- Involved in writing Spring Configuration XML file dat contains declarations and other dependent objects declaration.
- Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
- Written Java classes to test UI and Web services through JUnit.
- Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.
- Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
- Used Soap UI for testing teh Web Services.
- Use of MAVEN for dependency management and structure of teh project
- Create teh deployment document on various environments such as Test, QC, and UAT.
- Involved in system wide enhancements supporting teh entire system and fixing reported bugs.
- Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating teh POC.
- Done data manipulation on front end using JavaScript and JSON.
Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.
