Hadoop Developer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- 8+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and Applications.
- 4+ years of Data Analytics experience in Apache Hadoop Cloudera and Hortonworks Distributions
- Expertise in core Hadoop and Hadoop technology stack which includes HDFS, Map Reduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Storm, Kafka and Zookeeper.
- Experience in AWS cloud environment and on s3 storage and ec2 instances and deploying in it.
- In - depth knowledge of Statistics, Machine Learning, Data mining.
- Developed schedulers that communicated teh teh cloud based services (aws) to retrieve teh data.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Experienced in implementing complex algorithms on semi/unstructured data using Map reduce programs.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and /external tables.
- Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
- Good knowledge on Python.
- Spark Streaming collects dis data from Kafka in near-real-time and performs necessary transformations and aggregation on teh fly to build teh common learner data model and persists teh data in NoSQL store (Hbase).
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
- Working knowledge of Business Intelligence system tools like Tableau and Windows Azure.
- Specialization in Data Ingestion, Processing, Development from Various RDBMS data sources into a Hadoop Cluster using Map Reduce/Pig/Hive/Sqoop
- Experienced in implementing unified data platform to get data from different data sources using Apache Kafka brokers, cluster, Java producers and Consumers.
- Excellent Working Knowledge in Spark Core, Spark SQL, Spark Streaming.
- Developed Spark jobs using scalain test environment for faster data processing and used Spark SQL for querying.
- Experienced in working with in-memory processing frame work like Spark transformations, SprakSQL and Spark streaming using scala.
- Experienced in proving User based recommendation by implementing collaborative filtering and matrix factorization and different classification techniques like random forest, SVM, K-NN using Spark Mlib library.
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra, MongoDB, Teradata and on Data warehouse.
- Installed and configured Cassandra and good knowledge about Cassandra architecture, read, write paths and query.
- Implemented Frameworks using java and python to automate teh ingestion flow.
- Involved in NoSQL (Datastax Cassandra) database design, integration and implementation and written scripts and invoked them using CQLSH.
- Involved in data modeling in Cassandra and Involved in implementing sharding and replication strategies in MongoDB.
- Designed, developed, and monitored Oracle-NoSQL databases, Apache web and cloud server frameworks in LINUX for high performance, VMWare cloud storage for performance-query tuning, ETL processes, large file storage.
- Developed fan-out workflow using flume for ingesting data from various data sources like Webservers, Rest API by using different sources and ingested data into Hadoop with HDFS sink.
- Experienced in implementing custom interceptors and sterilizers in flume for specific customer requirements.
- Have solid understanding of Rest architecture style and its application to well performing web sites for global usage.
- Built data platforms, pipelines, storage systems using teh Apache Kafka, Apache Storm and search technologies such as elastic search.
- Experienced with batch processing of data sources using Apache Spark, elastic search.
- Tool monitored log input from several datacenters, via Spark Stream, was analyzed in Apache Storm and data was parsed and saved into Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems MYSQL, Oracle, Teradata and vice versa.
- Experience in developing strategies for Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouse and Data Marts using informatica.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Implemented solr indexing for faster retrieval of required field.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine tuning of Linux Redhat.
- Worked on Cluster co-ordination services through Zookeeper.
- Actively involved in coding using CoreJavaand collection API's such as Lists, Sets and Maps.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experience on different operating systems like UNIX, Linux and Windows.
- Experience on Java Multi-Threading, Collection, Interfaces, Synchronization, and Exception Handling.
- Involved in writing PL/SQL stored procedures, triggers and complex queries.
- Worked in Agile environment with active scrum participation.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Map reduce, HBase, Pig, Hive, Sqoop, MongoDB, Cassandra, Flume, Oozie, Zookeeper, AWS, Spark, Kafka, Teradata, Storm, ETL, Informatica, talend, solr, scala, Jenkins, Elastic Search.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans, Maven, Gradle, JUnit, TestNG.
IDE’s: Eclipse, Net beans, Intellij Idea.
Frameworks: MVC, Struts, Hibernate, Spring.
Programming languages: C,C++, Java, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MYSQL, DB2, MS-SQL SERVER
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL, JAX-RS, Restful, JAX-WS.
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Version Controls: CVS, SVN, GIT.
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Sqoop, spark, AWS, cloudera.
- Process involved extracting data through sqoop, Transforming Data using Pig, Hive, pyspark and loading data into oracle.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Worked on writing automatio sqoop scripts to import/export teh data from external sources such as Oracle and Teradata.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh tableau.
- Worked on exporting from Hadoop hive database or hadoop file from hdfs to external sources (Oracle) .
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce way.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning teh performance Pig queries.
- Created Pig Latin scripts to sort, group, join and filter teh enterprise wise data.
- Worked on getting teh data from oracle to HDFS as file in sqoop and then created a view on top of it for querying teh data and then for automating it.
- Worked on handling special characters in Data using hive and pig.
- Managed and reviewed Hadoop log files.
- Worked on automating batch using Jill scripts for autosys.
- Worked on schedule jobs through autosys and migrating jobs to all Higher environments.
- Worked on migrating teh code to higher environments.
- Supported ST and prod runs.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
- Developed Spark scripts using Python, Spark SQL to access hive tables in spark for faster data processing.
- Extensively used Spark SQL, Pyspark API's for querying and transformation of data residing in Hive.
- Large data sets were analyzed using Pig scripts and Hive queries.
- Worked on custom Pig loaders to work with a variety of data formats such as JSON, CSV etc.
- Developed Pig Latin scripts to extract data from teh web server output files to load into HDFS.
- Developed Shell Script to perform Data Profiling on teh ingested data with teh halp of hive.
- Experience in scripting for automation, and monitoring using Shell scripts.
Confidential, Dublin, OH
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Hbase database and Sqoop, Flume, Cassandra, zookeeper, AWS, cloudera.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using Map Reduce programs.
- Implemented Map reduce programs to retrieve Top-K results from unstructured data set.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from HDFS to MYSQL, Oracle, Teradata using Sqoop.
- Built data platforms, pipelines, storage systems using teh Apache Kafka, Apache Storm and search technologies such as elastic search.
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
- Experienced with batch processing of data sources using Apache Spark, elastic search.
- Have solid understanding of Rest architecture style and its application to well performing web sites for global usage.
- Experience in AWS cloud environment and on s3 storage and ec2 instances
- Developed fan-out workflow using flume for ingesting data from various data sources like Webservers, Rest API by using different sources and ingested data into Hadoop with HDFS sink.
- Installed and configured Cassandra and good knowledge about Cassandra architecture, read, write paths and query.
- Implemented various ETLsolutions as per teh business requirement using informatica
- Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into teh Data Warehouse.
- Implemented Frameworks using java and python to automate teh ingestion flow.
- Involved in data modeling in Cassandra and involved in choosing indexes and primary keys based on teh client requirement.
- Developed Spark jobs using scalain test environment for faster data processing and used Spark SQL for querying.
- Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS .
- Designed and implemented SOLR indexes for teh metadata that enabled internal applications to Scopus content.
- Used Spark for Parallel data processing and better performances using Scala.
- Extensively used Pig for data cleansing and extract teh data from teh web server output files to load into HDFS.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAIDconfigurations.
- Developed a data pipeline using Kafkaand Storm to store data into HDFS.
- Implemented Kafka Java producers, create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Implemented Storm topologies to preprocess data, implemented custom grouping to configure partitions.
- Managed and reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig and also written Pig Latin scripts.
- Used Maven as teh build tool and is scheduled/triggered by Jenkins (build tool).
- Responsible to manage data coming from different sources such as RDBMS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL SERVER, Sqoop, Java (jdk 1.6), Spark, kafka, AWS, MongoDB, Storm, Cassandra, ETL, Python, REST API, XML, JSON, solr, cloudera, Oracle, Teradata, scala, GIT, Agile, Jenkins, Elastic Search.
Confidential - Memphis, TN
Hadoop Developer
Responsibilities:
- Installed and configured Cassandra and good knowledge about Cassandra architecture, read, write paths and quering using Cassandra shell.
- Worked on writing Map Reduce jobs to discover trends in data usage by customers.
- Worked on and designed Big Data analytics platform for processing customer interface ps and comments using Java, Hadoop, Hive and Pig, Cloudera.
- Involved in hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
- Importing and exporting data into HDFS and Hive using Sqoop from Oracle, Teradata and vice versa.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Migrating various hive UDF’s and queries into Spark SQL for faster requests as part of POC implementation.
- Developed fan-out workflow using flume for ingesting data from various data sources like Webservers, Rest API by using different sources and ingested data into Hadoop with HDFS sink.
- Implemented Frameworks using java and python to automate teh ingestion flow.
- Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
- Installed and configured Hive and also written Hive QL scripts.
- Experience with loading teh data into relational database for reporting, dash boarding and ad-hoc analyses, which revealed ways to lower operating costs and offset teh rising cost of programming.
- Involved in ETL code deployment, Performance Tuning of mappings in Informatica.
- Created reports and dashboards using structured and unstructured data.
- Experienced with performing analytics on Time Series data using HBase.
- Implemented HBase co-processors, Observers to work as event based analysis.
- Hands on Installing and configuring nodes CDH4 Hadoop Cluster on CentOS.
- Implemented Hive Generic UDF's to implement business logic.
- Experienced with accessing Hive tables to perform analytics from java applications using JDBC.
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
- Used Maven as teh build tool and is scheduled/triggered by Jenkins (build tool).
- Experience with streaming work flow operations and Hadoop jobs using Oozie workflow and scheduled through AUTOSYS on a regular basis.
- Performed operation using Partitioning pattern in Map Reduce to move records into different categories.
- Involved in downloading teh customer tweets are from twitter using Flume in JSON format and stored those tweets into HDFS.
- Responsible for batch processing and real time processing in HDFS and NOSQL Databases.
- Responsible for retrieval of Data from Casandra and ingestion to PIG.
- Experience in customizing map reduce framework at various levels by generating Custom Input formats, Record Readers, Partitioner and Data types.
- Experienced with multiple file in HIVE, AVRO, Sequence file formats.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Script.
Environment: Casandra, Map jobs, Spark SQL, ETL, Pig Scripts, Flume, Hadoop BI, Pig UDF’s, Oozie, AVRO, Hive, Map Reduce, Java, Eclipse, Zookeeper, Informatica, Oracle, Teradata, Python, REST API, JSON, XML, cloudera, GIT, Agile, Jenkins.
Confidential, Indianapolis, IN
Hadoop Developer
Responsibilities:
- Involved in teh Complete Software development life cycle (SDLC) to develop teh application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop, Cassandra, zookeeper.
- Involved in loading data from LINUX file system to HDFS.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Importing and exporting data into HDFS and Hive using Sqoop from Oracle, Teradata and vice versa.
- Implemented test scripts to support test driven development and continuous integration.
- Developed multiple Map Reduce jobs in java for data cleaning.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Performance Tuning of JVMheap size, garbage collections, java stack and Native thread & production performance.
- Created Pig Latin scripts to sort, group, join and filter teh enterprise wise data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce way.
- Have solid understanding of Rest architecture style and its application to well performing web sites for global usage.
- Supported MapReduce Programs those are running on teh cluster.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Implemented Frameworks using java and python to automate teh ingestion flow.
- Worked on tuning teh performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple Map reduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked on zookeeper for coordinating between different master node and datanodes
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Hbase, zookeeper, SQL SERVER, python, REST API, JSON, XML, Oracle, Teradata, GIT, Agile.
Confidential, Orlando, FL
Java /J2EE Developer
Responsibilities:
- Work with business users to determine requirements and technical solutions.
- Followed Agile methodology (Scrum Standups, Sprint Planning, Sprint Review, Sprint Showcase and Sprint Retrospective meetings).
- Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.
- Used SPRING framework that handles application logic and makes calls to business make them as Spring Beans.
- Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring with Hibernate.
- Developed web services to allow communication between applications through SOAP over HTTP with JMS and mule ESB.
- Actively involved in coding using CoreJavaand collection API's such as Lists, Sets and Maps
- Developed a Web Service (SOAP, WSDL) that is shared between front end and cable bill review system.
- Implemented Rest based web service using JAX-RS annotations, Jersey implementation for data retrieval with JSON.
- Developed MAVEN scripts to build and deploy teh application onto Web logic Application Server and ran UNIX shell scripts and implemented autodeployment process.
- Used Maven as teh build tool and is scheduled/triggered by Jenkins (build tool).
- Develop JUNIT test cases for application unit testing.
- Implement Hibernate for data persistence and management.
- Used SOAP UI tool for testing web services connectivity.
- Used SVN as version control to check in teh code, Created branches and tagged teh code in SVN.
- Used RESTFUL Services to interact with teh Client by providing teh RESTFUL URL mapping.
- Used Log4j framework to log/track application and debugging.
Environment: JDK 1.6, Eclipse IDE, Core Java, J2EE, Spring, Hibernate, Unix, Web Services, SOAP UI, Maven, Web logic Application Server, SQL Developer, Camel, Junit, SVN, Agile, SONAR, Log4j, REST, Log 4j, JSON, JBPM.
Confidential
Java Developer
Responsibilities:
- Involved in analysis, design and development of Expense Processing system.
- Created used interfaces using JSP.
- Developed teh Web Interface using Servlets, Java Server Pages, HTML and CSS.
- Developed teh DAO objects using JDBC.
- Business Services using teh Servlets and Java.
- Design and development of User Interfaces and menus using HTML 5, JSP, Java Script, client side and server side validations.
- Developed GUI using JSP, Struts frame work.
- Involved in developing teh presentation layer using Spring MVC/Angular JS/JQuery.
- Involved in designing teh user interfaces using Struts Tiles Framework.
- Used Spring 2.0 Framework for Dependency injection and integrated with teh Struts Framework and Hibernate.
- Used Hibernate 3.0 in data access layer to access and update information in teh database.
- Experience in SOA (Service Oriented Architecture) by creating teh web services with SOAP and WSDL.
- Developed JUnit test cases for all teh developed modules.
- Used Log4J to capture teh log that includes runtime exceptions, monitored error logs and fixed teh problems.
- Used RESTFUL Services to interact with teh Client by providing teh RESTFUL URL mapping.
- Used CVS for version control across common source code used by developers.
- Used ANT scripts to build teh application and deployed on Web logic Application Server 10.0.
Environment: - Struts1.2, Hibernate3.0, Spring2.5, JSP, Servlets, XML,SOAP, WSDL, JDBC, JavaScript, HTML, CVS, Log4J, JUNIT, Web logic App server, Eclipse, Oracle, Restful.
