Sr. Hadoop/Big data Developer Resume TEXAS - Hire IT People

SUMMARY:

Around 9+ years of programming experience involved in all phases of Software Development Life Cycle (SDLC).
Around 5 Years of Big data related architecture experience developing Hadoop applications.
Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, Kafka, Storm, Spark, MongoDB, and Cassandra.
Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.
Data Processing: Processed data using MapReduce and Yarn. Worked on Kafka as a proof of concept for log processing
Experienced with performing real time analytics on NoSQL databases like HBase and Cassandra.
Good knowledge in working with Impala, Storm and Kafka.
Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Experience web services, data modeling, content processing, data replication.
Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
Familiarity with distributed coordination system Zookeeper.
Design and develop test plan for test cases based on functional and design.
Analyzed large amounts of data sets using Pig scripts and Hive scripts.
Experience in data warehousing with ETL tool Oracle Warehouse Builder (OWB).
Hands on experience in working with database like Oracle, MySQL and PL/SQL.
Good working knowledge on processing Batch applications.
Experience with streaming data using IBM streams processing language.
Experience in capturing and analyze the data in motion using info sphere stream.
Experienced in writing MapReduce programs and UDFs for both Hive and Pig in Java.
Experienced in developing Web Services with Python programming language.
Involved in Design and Development of technical specifications using Hadoop Echo System tools
Cutting edge experience on Splunk (Log based performance monitoring tool).
Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop and Flume.
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON a Avro.
Experience in using different file formats - Avro, Sequence Files, ORC, JSON and Parquet.
Experience in Performance Tuning, Optimization and Customization.
Experience with Unit Testing Map Reduce programs using MRUnit, JUnit.
Experience in Active Development as well as onsite coordination activities in web-based, client/server and distributed architecture using Java, J2EE which includes Web services, Spring, Struts, Hibernate and JSP/Servlets along with incorporating MVC architecture.
Good working knowledge on servers like Tomcat, Web Logic 8.0.
Ability to work in teams as well as an individual, quick learner and able to meet deadlines.

TECHNICAL SKILLS:

Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Splunk, Zookeeper, Oozie, Sqoop, Flume, Impala, Solr, and Ranger.

No SQL: HBase, MongoDB, Couchbase, Neo4j, Cassandra

Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python, Scala

Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine

Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS

Operating system: Windows, Macintosh, Linux and Unix

Frameworks: Springs, MVC, Hibernate, Swings

DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits

Version Control: SVN, CVS and Rational Clear Case Remote Client, GitHub, Visual Studio

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer, WinScp, Tahiti, Cygwin, pentaho

PROFESSIONAL EXPERIENCE:

Sr. Hadoop/Big data Developer

Confidential, TEXAS

Responsibilities:

Experience in design and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Impala, Kafka and Spark.
Performed Real-time streaming of data using Kafka and processing of data using SparkStreaming and loading of data into Kudu tables..
Developed a solid solution for handling offsets for kafka topics in scala. Persist offsets in hbase for minimizing the data loss and reprocessing time.
Implemented near real time aggregations by joining on multiple topic with intermediate storage (kudu). The aggregations are running in near real time with (10 mins) interval.
Implemented a solution for data archiving process with configurable intervals.
Good understanding with kudu partitions, primary keys.
Configured workflows for scheduling spark jobs with oozie.
Good understanding with integrating multiple ecosystems like spark-hbase and spark-kudu
Experience in integrating Kafka with Spark Streaming for real time data processing.
Extract streaming complex structured data from Kafka and process the files using Spark Streaming and load the data into Kudu tables.
Worked on conversion of Hive/ SQL queries into Spark transformations using Spark RDDs and data frames (DF).
Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
Configured Zookeeper for coordinating the cluster to maintain data consistency.
Involved in loading and transforming of large sets of structured, semi-structured and unstructured data into HDFS.
Implemented dynamic partitioning and bucketing in Hive.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Worked on Cluster Monitoring, troubleshooting and Disk topology.
Processed different file formats like AVRO, PARQUET, Sequence files and ORC.
Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
Worked with Data Scientists to design/develop solutions for Data Analysis
Designed the ETL process from various sources into Hadoop/HDFS for analysis and further processing.

Confidential, Two-Destiny Way, TEXAS

Sr Hadoop Developer

Responsibilities:

Assisted in the development of high quality analytical reports of weather and GIS data.
Used Talend to generate optimized code to load, transform, enrich, and cleanse data inside Hadoop.
Moved Relational Database data using Sqoop as ETL tool into the Data Lake,Hive Dynamic partition tables.
Build Hadoop clickstream workflows using Apache Hive and Pig for extraction, transformation and loading of data.
Imported unstructured data like logs from different web servers to HDFS using Flume and developed MapReduce jobs for log analysis, recommendations and analytics.
Involved in real-time data processing using Storm.
Expertise in real-time analytics, machine learning and continuous monitoring of operations using Storm.
Converted and loaded local data files into HDFS through the UNIX shell.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created HBase tables to store variable data formats coming from different portfolios.
Worked with NoSQL database Hive, Hbase to create tables and store data.
Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems
Worked on Apache Nifi to Uncompress and move json files from local to HDFS
Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
Used Kafka along with HBase to render data streaming.
Developed Cassandra data model to match the business requirements
Involved in Administration of Cassandra cluster along with Hadoop, Pig and Hive.
Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume.
Developed spark scripts by using scala shell as per requirements.
Created Views from Hive Tables on top of data residing in Data Lake.
Build advanced ETL logic on clickstream, log and tax data based on complex technical and business requirements.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
In data exploration stage used hive and impala o get some insights about the customer data.
Evaluated Spark's performance vs impala on transactional data.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, scala.
Involved in configuring batch job to perform ingestion of the source files in to the Data lake
Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
Worked on Github to check-in and checkout source code.
Wrote scripts in Python for extracting data from HTML file.
Worked with NoSQL Cassandra to store, retrieve, and update and manage all the details for Ethernet provisioning and customer order tracking.
Implemented Spark for fast interactive data analysis of datasets loaded in RDD.
Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL and Spark streaming.
Developed tools using Python, Shell scripting, XML to automate some of the menial tasks
Used Pig for aggregation, cleaning and incremental ETL functions and developing UDFs for filtering.
Wrote scripts in Python for extracting data from HTML file.
Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with Cron jobs.
Redshift data model and Tableau Server configurations to provide guaranteed response time for reports.
Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
Migrated an existing on-premises application to AWS.
Developed Pig UDFs to specifically preprocess and filter data sets for analysis.
Used Teradata database management system to manage the warehousing operations and parallel processing.
Validated data sets graphically with Excel and did touch ups in Photoshop.
Designed & scheduled workflows for updating system reports using Oozie.
Ability to working with team environment and solved problems.

Environment: Cloudera, Avro, HBase, HDFS, Hive, Pig, Java, SQL, Sqoop, Flume, Oozie, Java (jdk 1.7), Eclipse, Splunk, YARN, SQL Server, Spark, python, Hortonworks, Zookeeper, SVN, Talend

Confidential, Dallas, Texas

Hadoop Developer

Responsibilities:

Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS.
Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and Map Reduce.
Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
Experience with Map Reduce coding.
Solved small file problem using Sequence files processing in Map Reduce.
Written various Hive and Pig scripts.
Experience in Upgrading cluster, CDH and HDP Cluster.
Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
Used flume, sqoop, hadoop,spark and oozie for building data pipeline
Created HBase tables to store variable data formats coming from different portfolios.
Experience in upgrading Hadoop cluster hbase/zookeeper from CDH3 to CDH4.
Performed real time analytics on HBase using Java API and Rest API.
Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
Setup flume for different sources to bring the log messages from outside to HDFS.
Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
Worked on compression mechanisms to optimize MapReduce Jobs.
Real time experience with analytics and BI.
Wrote Python scripts to parse XML documents and load the data in database..
Experienced with working on Avro Data files using Avro Serialization system.
Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Unit tested and tuned SQLs and ETL Code for better performance.
Monitored the performance and identified performance bottlenecks in ETL code. .
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Zookeeper, YARN, Oozie, Java, Eclipse

Confidential, Wilmington, DE

Big Data/Hadoop Developer

Responsibilities:

Created Hive tables and working on them using Hive QL.
Involved in installing Hadoop Ecosystem components.
Validated Name node, Data node status in a HDFS cluster.
Importing and exporting data from HDFS to RDBMS and vice-versa using SQOOP.
Experienced in developing HIVE Queries on different data formats like Text file, CSV file.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Installed and configured Hadoop cluster in Test and Production environments
Performed both major and minor upgrades to the existing CDH cluster
Code review as per the customer coding standards.
Testing and providing the valid test data to users as per requirement.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Responsible to manage data coming from different sources.
Supporting Hbase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
Involved in HDFS maintenance and loading of structured and unstructured data.
Wrote Hive queries for data analysis to meet the business requirements.
Installed and configured Pig and also written Pig Latin scripts.
Developed UDFs for Pig Data Analysis.
Involved in managing and reviewing Hadoop log files.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Java, Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Zookeeper, Linux, XML, Eclipse, Cloudera

Confidential

Java Developer

Responsibilities:

Performed requirements analysis and design and used of Rational Unified Process for analysis, design and documentation purposes.
Interacted with the client to understand the requirements thoroughly in a short span of time.
Developed Use cases, traceability matrices, design specification and test documents including UATs, training and implementation manuals.
Involved in design and development of the architecture of the web applications in JSP.
Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from backend Oracle database using JDBC.
Involved in developing programs for parsing the XML documents using XML Parser.
Involved in tuning and profiling the application for better data transactions and performance used JProbe for the same.
Designed and developed Enterprise Stateless and Stateful Session beans to communicate with the Container Managed Entity Bean backend services.
Involved in Unit Testing and Integration Testing.
Designed and developed various modules of the application with J2EE design architecture and frameworks like Spring MVC architecture and Spring Bean Factory using IOC, AOP concept.
Followed agile software development with Scrum methodology.
Wrote application front end with HTML, JSP, JSF, Ajax/JQuery, Spring Web Flow and XHTML.
Used J Query for UI centric Ajax behavior.
Implemented JAVA/J2EE design patterns such as Factory, DAO, Session Façade and Singleton.
Used Hibernate in persistence layer and developed POJO's, Data Access Object (DAO) to handle all Database operations.
Implemented features like logging, user session validation using Spring-AOP module.
Developed server-side services using Java, Spring, Web Services (SOAP, WSDL, JAXB, JAX-RPC)
Worked on Oracle as the backend database.
Used JMS for messaging.
Used Log4j to assign, track, report and audit the issues in the application.
Develop and execute Unit Test plans using J Unit, ensuring that results are documented and reviewed with Quality Assurance teams responsible for integrated testing.
Worked in deadline driven environment with immediate feature release cycles.

Environment: - Java, Spring, Hibernate, JSP, HTML, CSS, XML, JavaScript, JQuery, JUnit, AJAX, Multi-Threading, Oracle, Web Service - SOAP, WebSphere, MYSQL.

Confidential

Java Developer

Responsibilities:

Performed in various phases of the Software Development Life Cycle (SDLC)
Developed user interfaces using JSP framework with AJAX, Java Script,HTML,XHTML,and
CSS
Performed the design and development of various modules using CBD Navigator Framework
Deployed J2EE applications in Web sphere application server by building and deploying ear fileusing ANT script
Created tables, stored procedures in SQL for data manipulation and retrieval.
Used technologies like JSP, JavaScript and Tiles for Presentation tier.
Performed the design and development of various modules using CBD Navigator Framework
Deployed J2EE applications in Web sphere application server by building and deploying ear file using ANT script.
Created tables, stored procedures in SQL for data manipulation and retrieval.
Used technologies like JSP, JavaScript and Tiles for Presentation tier.
CVS tool is used for version control of code and project documents.
Application Server, UML, JUnit, JTest, Netbeans, Windows 2000.

Environment: JSP, Servlets, JDK, JDBC, XML, JavaScript, HTML, Spring MVC, JSF, Oracle 8i, Sun Application Server, UML, JUnit, JTest, Netbeans, Windows 2000.

We provide IT Staff Augmentation Services!

Sr. Hadoop/big Data Developer Resume

TexaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship