Senior Hadoop Developer/admin Resume
Miami, FloridA
SUMMARY
- Over 7+ years of extensive IT experience with multinational clients which includes four plus years of Hadoop related architecture experience developing Big data / Hadoop applications.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume, Oozie and Zookeeper)
- 6+ years of experience in design and development of data warehouse and business intelligence solutions using Ab Initio ETL tools
- Well versed in configuring and administrating the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera
- Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
- Experienced with performing real time analytics on NoSQL data bases like HBase, MongoDB and Cassandra.
- Developed databases and projects using Python, Java, NoSQL/MYSQL.
- Good knowledge in creating event processing data pipelines using Kafka, Storm and Hbase.
- Experience with Talend DI installation, administration and development for data ware house and applications integration.
- Experienced with ETL to load data into Hadoop/NoSQL
- Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Excellent knowledge on Hadoop Architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Hortonworks and Map Reduce programming paradigm.
- Extensively worked on ETL processing tools like Pentaho and Talend.
- Configured Ab Initio environment to connect to database using DB configuration file, input table, output table, and update table components.
- Involved in developing solutions to analyze large data sets efficiently.
- Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, Cursors, Index, ANT, Maven, triggers and packages.
- Worked with Oozie work flow engine to schedule time based jobs to perform multiple actions.
- Experienced in importing and exportingdatabetween RDBMS andTeraDatainto HDFS using Sqoop
- Analyzed large amounts of data sets writing Pig scripts and Hive queries
- Logical Implementation and interaction with HBase, MongoDB.
- Experienced in writing MapReduce programs &UDFs for both Hive & Pig in Java
- Used Flume to channel data from different sources to HDFS.
- Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
- Expertise with NoSQL databases like Cassandra, Hbase.
- Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java.
- Experience in developing test cases, performing Unit Testing, Integration Testing, experience in QA with test methodologies and skills for manual/automated testing using tools like WinRunner, JUnit.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying
- Good knowledge in Apache Crunch and Hadoop HDFS Admin Shell commands.
- Experience with Testing Map Reduce programs using MRUnit, JUnit and Easy Mock.
- Experienced with implementing Web based, Enterprise level applications using J2EE frameworks like Spring, Hibernate, EJB, JMS, JSF and Java.
- Experience with web - based UI development using JQuery, CSS, HTML5, XHTML
- Experienced with implementing/consumed SOAP Web Services using Spring CXF and Consumed Rest Web Services using Http Clients.
- Worked with developers, DBAs, and systems support personnel in elevating and automating successful code to production.
- Experienced in writing functions, stored procedures, and triggers using PL/SQL.
- Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies
- Motivated team player with excellent communication, interpersonal, analytical, problem solving skills and zeal to learn new technologies.
- Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.
TECHNICAL SKILLS
Programming Languages: C, C++, JAVA, Python, PHP, SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting
Big Data Technologies: Hadoop, MapReduce, Spark, Sqoop, Tera data, Hive, Oozie, PIG, HDFS, Zookeeper, Flume, Talend
J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Web Technologies: AJAX, HTML5,JavaScript,CSS3
Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts
Application Servers: IBM WebSphere, JBoss WebLogic
Web Servers: WSDL, SOAP, Apache CXF, Apache Axis, REST, Jersey
Relational Databases: Oracle 10/11g, MS SQL Server, My SQL
NoSQL Databases: Monod, HBase
Designing Tools: UML, Visio, Visual Paradigm
IDEs: Eclipse, NetBeans
Operating System: Unix, Windows
PROFESSIONAL EXPERIENCE
Confidential, Miami, Florida
Senior Hadoop Developer/Admin
Responsibilities:
- Handled importing of data from RDBMS into HDFS using Sqoop.
- Managing data flow into Pivotal HAWQ (Internal / External tables).
- Experienced in data cleansing processing using Pig latin operations and UDFs.
- Experienced in writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Install, upgrade and maintain the Hadoop Clusters using Cloudera manager.
- Involved in creating Hive tables, loading with data and writing hive queries to process the data.
- Created scripts to automate the process of Data Ingestion.
- Conducted predictive analysis using R Language and plotted graphs for predictive results.
- Talend administrator with hands on Big Data(Hadoop) with Cloudera framework.
- Gathering the requirement from Senior Management and Code Enhancement and performance improvement for Ab Initio graphs.
- Writing R language code for statistical computing and visualization of the Hive data for generating reports.
- Document the installation, deployment, administration and operational processes of Talend MDM platform environments for ETL projects.
- Analyzes FACETS for Group information, enrolling subscribers, Adding members, Related Entities, Class/Plan definition and premium rate tables.
- Developed proof of concept(POC) for real time data ingestion using Kafka, Storm, Zookeeper, Hbase.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data fromMySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used Maven to clean, compile, build and ANT to deploy the jars in HDFS.
- Developed Simple to complex Map/reduce Jobs.
- Logical Implementation and interaction with HBase, MongoDB.
- Successfully loaded files to Hive and HDFS from Oracle, SQL Server, MySql, and Teradata using Sqoop.
- Worked hands on with ETL process. Handled importing data from various sources, performed transformations.
- Involved in testing FACETS for Group information, enrolling subscribers, Adding members, Related Entities, Class/Plan definition and premium rate tables.
- Experienced with ETL to load data into Hadoop/NoSQL
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Developed Simple to complex Map/reduce Jobs using Hive.
- Created partitioned tables in Hive.
- Worked on Installed and configured Hadoop 0.22.0 Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Solution planning with java EE and Cassandra.
- Developed Map reduce code for Apache spark in Python and Scala
- Importing and exporting data into HDFS and HIVE using Sqoop
- Experience with Maven for structured build Unix shell scripting.
- Responsible to manage data coming from different sources
- Monitoring the running Map Reduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS.
- Experienced in scheduling jobs using TIDAL job scheduler.
- Developed PIG scripts for source data validation and transformation.
- Design solution architecture which include TIDAL jobs, SISS packages, Database objects.
- Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files
- Experienced in configuring work flows, submitting jobs, implementing schedulers using Cisco Tidal.
- Wrote Java programs to generate reports to meet the business requirement using JAVA API’s and data from Pivotal HAWQ tables.
Environment: Pivotal HD 2.0, HDFS, HAWQ, ANT, Sqoop, Talend, Ab Initio, Hbase, Cloudera, Teradata, ETL, Hive, Pig, SQL, PostgreSQL, R Language, pgadmin, NoSQL, Storm, Cisco tidal, shell scripting, Python, Java, Cassandra.
Confidential, Greenwood village
Hadoop Developer/Admin
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco System, Cloudera Manager using CDH4 Distribution.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
- Imported data using Sqoop from Tera data using Tera data connector.
- Possess good Linux and Shell Scripting and familiarity with open source configuration management and deployment tools such as puppet or chef.
- Involved in design and developed Storm and Kafka based pipeline.
- Tested and validated Identity Management CITT implementation as it pertains to the upgrade to CARE and FACETS enrollment system.
- Install OS and administrated Hadoop Stack with CDH5(with YARN) Cloudera distribution including configuration management, monitoring, debugging and performance management.
- Integrated Quartz scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
- Responsible for architecting integrated HIPAA, Medicare solutions, FACETS.
- Responsible for performing Predictive analysis on top of the customer usage data using R Language.
- Responsible for developing a metadata configurable driven to execute Druid and Hive queries via report query engine against the data warehousing (Hadoop).
- Troubleshooting, debugging and fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
- Developed Complex generic graphs in Ab Initio
- Perform system testing using Informatica and Tidal jobs for validation.
- Created build scripts using Maven and ANT
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Job scheduling through Cron Tab and TIDAL
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Developed Ab Initio graphs using databases, dataset, repartition, transform, sort and partition components for extracting, loading and transforming external data by creating DML’s, XFR’s, SQL’s.
- Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked Big data processing of clinical and non clinical data using Map Reduce.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Migrated ETL jobs to Pig scripts do transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked on implementing SPARK with SCALA
- Extracted the data from Teradata into HDFS using the Sqoop.
- Data protection and privacy configurations on sensitive databases like MYSQL, MongoDB
- Responsible for importing log files from various sources into HDFS using Flume
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Worked hands on with ETL process using Python and Java.
- Modifying/writing scripts in Bash and korn shell for optimizing day to day administration.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
- Logical implementation with Hbase.
- Experienced with different kind of compression techniques like LZO, GZip, Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MongoDb, ANT, Talend, Druid, Storm, Cloudera, Teradata, ETL, Deployment tools, Bash, Korn, R Language, Spark, MapReduce, Python, Maven, TeraData, Java, Hive, Pig, Sqoop, Ab Initio, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL.
Confidential - Austin, TX
Hadoop Developer/Admin
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data fromMySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed Simple to complex Map/reduce Jobs.
- Real time streaming the data using Spark and Kafka.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Developed Simple to complex Map/reduce Jobs using Hive.
- Created partitioned tables in Hive.
- Administered and supported distribution of Hortonworks.
- Wrote Korn shell, Bash shell, Pearl scripts to automate most DB maintenance tasks.
- Worked on Installed and configured Hadoop 0.22.0 Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Importing data into hdfs using Spark and Kafka.
- Used Maven for continuous build integration and deployment.
- Importing and exporting data into HDFS and HIVE using Sqoop
- Developed and tested scripts in Python
- Responsible to manage data coming from different sources
- Monitoring the running Map Reduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS.
- Installed and configured Hive and also wrote Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
Environment: Apache Hadoop, Java (jdk1.6), Bash, Spark, Kafa, Korn, Hortonworks, Deployment tools, Python, Data tax, Flat files, Oracle 11g/10g, MySql, Toad 9.6, Window NT, UNIX, Sqoop, Hive, Oozie.
Confidential - San Ramon, CA
Hadoop Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Simple to complex Map Reduce Jobs using Hive and Pig
- Involved in runningHadoopjobs for processing millions of records of text data
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Familiarity with a NoSQL database such as MongoDb.
- Performed Hadoop installation, configuration of multiple nodes in AWS-EC2 using Hortonworks platform.
- Created Oracle Schedules and Control-M jobs for execution of some Oracle stored procedures on a scheduled basis.
- Designed the ETL process from various sources into the Hadoop/HDFS for analysis and further processing.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required
- Developed unit test cases for Hadoop MapReduce jobs with MRUnit
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS
- Worked with Informatica 8.6x and above (Source Analyzer, Mapping Designer, Mapplet Designer, Transformations Designer, Warehouse Designer, Repository Manager, and Workflow Manager/Server Manager). Learnt Talend on special interest and used it for the project to make them easy.
- Responsible for analyzing, designing and coding of applications using Perl.
- Worked on data conversion by extracting data from databases, reform data and load data into Cassandra nodes.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Supported Map Reduce Programs those are running on the cluster
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Hortonworks, Talend, ETL, Perl, MongoDB, Sqoop, Oozie, Kafka and Big Data,Python, Apache Java (jdk1.6), Data tax, Flat files, Oracle 11g/10g, MySQL, Toad, Windows NT, LINUX, Cassandra.
Confidential - Phoenix, AZ
Java Programmer
Responsibilities:
- Used Rational Rose for Use Case Diagram, Class Diagrams, Sequence diagrams and Object diagrams in design phase.
- Involved in creation of UML diagrams like Class, Activity, and Sequence Diagrams using modeling tools of IBM Rational Rose
- Involved in the full life cycle development of the modules for the project.
- Used Eclipse IDE for application developmentand expertise on UNIX shell scripting, Python.
- Used Spring framework for dependency injection and hands on experience with Lambda Expressions.
- Worked with Spring AOP for transaction and logging.
- Design, Build and Maintain automated load test scripts using Neustar or JMeter load test tools.
- Used JBoss application server for deploying applications.
- Used SOAP XML Web services for transferring data between different applications.
- Developed web services using top down approach from WSDL to Java.
- Used MVC design pattern for designing application, JSP as the view component.
- Persistence layer was implemented using Hibernate Framework. Integrated Hibernate with spring framework.
- Worked with complex SQL queries, SQL Joins and Stored Procedures using TOAD for data retrieval and update.
- Used JUnit for performing unit testing.
- Used Log4J to capture the logs that included runtime exceptions.
Environment: Eclipse, Web Services, UML, Struts (MVC), Lambda Expressions, Shell Scripting, Hibernate, Python, spring, JSP, WSDL, JMS, Rational Rose, JavaScript, Junit, PL/SQL, Oracle 10G, SVN
Confidential
JAVA/J2EE Developer
Responsibilities:
- Developing light weight business component and integrated applications using struts
- Designing and developing front-end, middleware and back-end applications.
- Optimizing server/client side validation.
- Transfer old Perl scripts into new Python scripts, add new functions and features. Develop automated test method and documentations for these scripts.
- Worked together with the team in helping transition from Oracle to DB2.
- Developed the global logging module which was used across all the modules using Log4Jcomponents.
- Developed the presentation layer for the credit enhancement module in JSP.
- Struts were used to implement the Model View Layer (MVC) architecture. Validations were done on the client side as well as the server side.
- Involved in the configuration management using ClearCase.
- Extensive experience in working with LINQ to objets, LINQ to XML and Lambda Expressions.
- Detecting and resolving errors/defects in the quality control environment.
- Using Ibatis for mapping Java classes with database.
- Involved in Code review and integration testing.
- Used Debugging tools such as PMD, Find Bugs and checkstyle.
Environment: Java v1.6, J2EE 6, Struts 1.2, iBatis, XML, Lambda Expressions. Perl, JSP, CSS, Python, HTML, JAVASCRIPT, JQuery, Oracle 10g, DB2, Unix, RAD, NetBeans, Clear Case, WebSphere V8.0 (beta)
