Sr. Hadoop/java Consultant Resume
Omaha, NE
PROFESSIONAL SUMMARY:
- A Hadoop Certified Professional with over 9 years of IT experience includes 3 Plus years of experience in Big Data, Hadoop Eco System related technologies with domain experience in Financial, Banking, Health Care, Retail and Non - profit Organizations in Software Development and support of applications.
- Excellent understanding/knowledge of Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Storm, Kafka, Spark, YARN, HBase, Oozie, ZooKeeper, Flume and Sqoop based Big Data Platforms.
- Expertise in design and implementation of Big Data solutions in Banking, Retail and E-commerce domains.
- Worked on Cloudera, Hortonworks Hadoop Distribution Environment.
- Experience in cluster administration of Hadoop 2.X and CDH 5.X versions.
- Experience in Redhat Linux, Kerberos, Active Directory/LDAP.
- Experienced in Java and Python programming
- Built API’s to integrate with multiple Teams..
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Installing, configuring, monitoring and maintaining HDFS, YARN, Hbase, Flume, Sqoop, Oozie, Pig and Hive
- Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring and troubleshooting.
- Involved in Deployment of clusters in AWS
- Expertise in managing business critical large Hadoop clusters including configuring high availability features.
- Experience in multiple format databases like Mongo DB, DynamoDB and No SQL databases
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml and Avro.
- Expertise in composing MapReduce Pipelines with many user-defined functions
- Used Scala and Python for Spark SQL and Spark Streaming.
- Implemented Spark using lambda architecture.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources HIVE.
- Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues.
- Involved in taking data backups onto Amazon Web Service S3
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (Hive QL).
- Experience with working different Hive SerDe's that handle file formats like avro, xml.
- Analyzed the data by performing Hive queries and used HIVE UDF's for complex querying NoSQL.
- Expert database engineer, NoSQL and relational data modeling using Microsoft Sql server.
- Expertise in HBase Cluster Setup, Configurations, HBase Implementation and HBase Client.
- Experience in Administering, Installation, Configuration, Troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of CentOS and Ubuntu.
- Experience in implementing CMMI framework to improve business process
- Complete domain and development Life Cycle knowledge of Data Warehousing & Client Server Concepts and knowledge of basic data modeling.
- SME and Lead Teams in major projects
TECHNICAL SKILLS:
Web Technologies: JSP, HTML, CSS, JavaScript, JQuery
JEE Technologies: Servlets, Web Services, WebLogic, WebSphere, Apache Tomcat
Languages: Java, Pig Latin, Elastic search, C, C++, SQL, PL / SQL
HADOOP Components: Hive, PIG, Sqoop, Flume, MapReduce, HBase, Oozie, Zookeeper, Kafka, Spark
Databases: NoSQL, Oracle, Cassandra, DB2, MySQL, SQLite, MS SQL Server 2008 / 2012, MS Access.
Operating Systems: Windows 98/NT/XP/Vista/7, Windows CE, Linux, UNIX, MAC.
Methodologies: Agile, Rapid Application Development, Waterfall Model, Iterative Model Design Patterns: Singleton, Adapter, Builder, Iterator, Template.
Frameworks: Hadoop, EJB, Struts.
PROFESSIONAL EXPERIENCE:
Confidential
Sr.Hadoop/Java Consultant
RESPONSIBILITIES:
- Loading different sources including Salesforce data to Hadoop
- Developed code for consumer in RabbitMQ to get messages from different sources.
- Involved in importing data from large RDMS systems to Hadoop.
- Worked on Big data platform Cloudera.
- Querying using Cloudera Hive for processing of data.
- Involved in Design and develop process framework and support data migration in Hadoop system.
- Creating Managed tables in Hive
- Initiating Spark Job to create JSON File.
- Involved in working with Dataframes using Spark
- Experience on both Spark Streaming and SparkSQL
- Extensively use of Java and Python languages for developing applications to extract and load the records to Hadoop and AWS.
- Built applications that integrate with components and API's created by different LOB's
- Worked on Jenkins and Nexus for implementing different API's into one single API
- Involved loading files to AWS through an intermediary API CloudBox for Capitalone.
- Load Batch Files to AWS S3 bucket on a day to day basis
- Extensive use of GitHub enterprise for pushing updated changes to the code across Development QA and production Environment.
- Pushing the updated code code to GIT repository.
- Keeping track of Jenkins Builds and dependencies through nexus.
- Timely backing up of data on AWS cloud S3
- Maintained Hadoop clusters for dev/qa/production. Trained the development, administration, testing and analysis teams on Hadoop framework and Hadoop eco system.
- Give extensive presentations about the Hadoop ecosystem, best practices, data architecture in Hadoop to testing teams.
- Data warehouse, Business Intelligence architecture design and develop. Designed the ETL process from various sources into Hadoop/HDFS for analysis and further processing.
- Debug and solve issues as the subject matter expert focusing issues around data sciences and processing.
- Define business and technical requirements, design Proof of Concept for evaluating criteria and scoring and select data integration and information management.
- Collaborate with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture.
- Integrating big data technologies and analysis tools into the overall architecture.
- Keeping track of Stories/Tasks in Version1 - Agile Methodology
- Getting tasks and Stories completed in Sprints.
Environment: Hadoop, Cloudera, HDFS, Spark, SparksSQL, MapReduce, Hive, Java1.8, RabbitMQ, Jenkins, Nexus, GIT, Shell Scripting, Agile, Version1.
Confidential
Sr. Hadoop Consultant
RESPONSIBILITIES:
- Provide technical designs, architecture, Support automation, installation and configuration tasks and upgrades and planning system upgrades of Hadoop cluster.
- Administer architecture of the Hadoop cluster, map reduce processes, Hbase system.
- Installed and configured multi-node fully distributed Hadoop cluster by installing all the necessary Hadoop ecosystem components on all the nodes using Cloudera Manager.
- Installed Kafka as a messaging system and managed Kafka to get data from different sources.
- Involved in various administration tasks like commissioning, decommissioning of nodes, cluster capacity planning, performance tuning, cluster monitoring and troubleshooting.
- Involved in importing data from large RDMS systems to Hadoop using Sqoop.
- Worked on Big data platform Cloudera.
- Querying using Cloudera Impala for faster processing of data
- Acted as a SME and Module Lead for two major projects undertaken with a team size of 3 people.
- Involved in Design and develop process framework and support data migration in Hadoop system.
- Worked on several Hadoop projects, using hives and pig.
- Creating Managed tables in Hive
- Involved in working with Rdd using Spark on a POC
- Experience on both Spark Streaming and SparkSQL
- Worked on elastic search
- Worked with Sqoop to import/export data from relational database to Hadoop and flume to collect data and populate in Hadoop.
- Timely backing up of data on AWS cloud S3
- Involved in multiple POC involving multiple databases like Mongo, DynamoDB and No SQL.
- Implemented and integrated Hadoop based business intelligence and Data Warehouse system including implementations of searching, filtering, indexing, aggregation for reporting and report generation and general information retrieval.
- Worked on POC involving Cassandra DB for low-latency real time applications.
- Maintained Hadoop clusters for dev/staging/production. Trained the development, administration, testing and analysis teams on Hadoop framework and Hadoop eco system.
- Involved in data migration from Oracle to HDFS.
- Making improvements to business process using CMMI framework
- Used CMMI for developing solutions and delivering services
- Debug and solve issues as the subject matter expert
- Develop Information Strategy in alignment with all agency strategy for master data management, data integration, data virtualization, metadata management, data quality and profiling, data modeling and data governance.
Environment: Hadoop, Cloudera, Hbase, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Java1.7, Oracle, Shell Scripting.
Confidential, Omaha, NE
Hadoop Admin
RESPONSIBILITIES:
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop Clusters.
- Monitored multiple Hadoop clusters environments using Ganglia.
- Managing and scheduling Jobs on a Hadoop cluster.
- Responsible for applying Software Configuration Management processes to projects, setting up and maintaining GitHub infrastructure and supporting a continuous delivery model by automating software build and package migration processes.
- Involved in defining job flows, managing and reviewing log files.
- Installed Oozie workflow engine to run multiple MapReduce, Hive and Pig jobs.
- Implemented MapReduce programs on log data to transform into structured way to find user information.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Worked on Hortonworks platform Developed data pipeline using Flume and Sqoop to ingest customer behavioral data and financial histories from traditional databases into HDFS for analysis.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Responsible to manage data coming from different sources.
- Extracted data from traditional data sources and placed into HDFS using Sqoop and pre-process the data for analysis.
- Involved in using Sqoop for importing and exporting data into HDFS
- Unit tested and tuned SQLs and ETL Code for better performance.
- Performed major role in understanding the business requirements and designing and loading the data into data warehouse (ETL).
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop, Hortonworks, Hbase, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Oracle, MySQL.
Confidential, Jacksonville, FL
Hadoop Developer
RESPONSIBILITIES:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Leading the team and assign work to the team members.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop.
- Created partitioned tables in Hive.
- Managed and reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Designed and Developed Pig Latin scripts.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Load and transform large sets of structured and semi structured data
- Responsible to manage data coming from different sources.
Environment: Hadoop, MapReduce, Hbase, HDFS, Hive, Pig, SQL and Sqoop, Core Java.
Confidential
Java Developer
RESPONSIBILITIES:
- Designed the Business Domain Model & Entity Relationships based on the System Requirements using E-R diagrams and UML diagrams.
- Building and deployment of WAR, JAR files on different build machines.
- Responsible for maintaining and configuring Clear Case for the team
- Used JAXP to create java objects from XML and vice-versa.
- Developed the rules with java objects based on the Fair Isaac Blaze Advisor Environment.
- Involved in creating HIPAA request and parsing HIPAA response for HIPAA transactions
- Tested the rules with tools like Test Harness, claredi.
- Used web services for data exchange using SOAP and WSDL.
- Used agile methodology for project management.
Environment: Java 1.6, Fair Isaac Blaze Advisor Rules Engine 6.0,Rational Clear Case v7.0.0.1, Rational Clear Quest Web interface v7.0.0.1, Test Harness Testing Tool.
Confidential
Java/ Web Developer
RESPONSIBILITIES:
- Developed the GUI of the system using JSP and client-side validations are performed using JavaScript.
- Built and accessed the database using JDBC for Oracle.
- Made stored procedures with PL/SQL at the database end.
- Developed Web pages and login form of insurance company using HTML/DHTML/CSS.
- Involved in coding JSP for the new employee registration, login and Single Sign On
- Worked on Servlets to handle client requests and carry out server side processing.
- Deployed the application on Tomcat Server.
- Used MVC architecture to build web application.
Environment: Java, Servlets, JSP, Tomcat, Oracle, RAD7.x, Applets, Apache ANT, XML, JavaScript, HTML.