Sr.hadoop Developer/technology Consultant Resume
Houston, TX
PROFILE SUMMARY
- Having around 9+ years of Professional experience in IT Industry, involved in Developing, Implementing and maintenance of various web based applications using Java and Big Data Ecosystem experience on Windows and Linux environments.
- Having around 3 years of experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
- Experienced with performing real time analytics on NoSQL data bases like HBase and Cassandra.
- Worked on AWS EC2, EMR and S3 to create clusters and manage data using S3.
- Good knowledge in working with Impala, Storm and Kafka.
- Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- 5 + years of experience in java development experience.
- Developing and Maintenance the Web Applications using the Web Server Tomcat.
- Experience on Source control repositories like SVN, CVS and GITHUB.
- Good Experience on SDLC (Software Development Life cycle).
- Have knowledge on google cloud platform.
- For Project Documentation used MS Word and Excel.
- Experience in migrating on premise to Windows Azure in DR on cloud using Azure Recovery Vault and Azurebackups.
- Strong Knowledge in InformaticaPower center, Data warehousing and Business intelligence.
- Good level of experience in Core Java, JEE technologies as JDBC, Servletsand JSP.
- Expert in developing web applications using Struts,Hibernate and SpringFrameworks.
- Hands on Experience in writing SQL and PL/SQL queries.
- Performance tuning of Vertica DB cluster. Install, upgrade, configure Vertica application, handle request for Vertica DB account setup for analysts.
- Good understanding and experience with Software Development methodologies like Agile,Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
- Monitor the ETL process job and validate the data loaded in Vertica/Teradata DW.
- Experience in Web Services using XML, HTML, and SOAP.
- Involved in maintain the Cognos10.2 version environment.
- Administration of Hadoop and Vertica clusters for structured and unstructured data warehousing.
- Worked on version control tools like CVS, GIT, SVN.
- Well Experience in projects usingJIRA, Testing, Maven, MS Build and Jenkins build tools.
- Experience in developing web pages using Java, JSP, Servlets, JavaScript, JQuery, Angular JS, Node, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, Junit, Apache-Tomcat andWeb Sphere.
TECHNICAL SKILLS:-
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Methodology: Agile, waterfall
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON,NodeJs.
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
ETL Tools: Talend, Informatica, Pentaho
PROFESSIONAL EXPERIENCE
Confidential - Houston, TX
Sr.Hadoop Developer/Technology Consultant
Responsibilities:
- Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Hortonworks Ambari UI.
- Highly motivated and versatileteam playerwith the ability to work independently &adapt quickly to the environment.
- Performed ad-hoc queries on structured data usingHive QL and used Partition, Bucketingtechniques and joins with Hive for faster data access.
- Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL RDDs and DataFrames/Datasets.
- Worked on Spark Streamingand Structured Spark streamingusing Apache Kafka for real time data processing.
- Co-ordinate with team in San Antonio on daily basis through teleconference discussing road blocks, issues and developments.
- Involved in creating external tables in hive in compressed formats both transactional and non-transactional.
- Worked on query performance and try to optimize it by using aggregations/optimizing techniques.
- Performed Data Quality check rules and logging techniques before and after executing the business requirement.
- Co-ordinated with TMS team in gathering data from Kafka producers team and writing spark-core jobs in order to achieve the business requirement.
- Used Dbeaveras SQL to see oversee the sample data and oversee the structure of data in the hive data base.
- Involved in reading uncompressed data formats like Gzip,Avro,Parquet and compressed the same according to the business logic by writing generic code.
- Subscribed to multiple topics in kafka and tried to import data over non-transactional to transactional table using JDBC connection.
- Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
- Performed Proof of concepts of Ni-fi to import data from Kafka to HDFS.
- Involved in Ni-fi to export data from AWS S3 to RDBMS,Glacier.
- DevelopedNi-fi workflow templates to replace oozie workflows of Sqoop,Kafka and Glacier.
Confidential - Dallas, TX
Big Data Engineer
Responsibilities:
- Gather business requirements and design and develop data ingestion layer and presentation layer.
- Clearly and regularly communicate with management and technical support colleagues in developing business modules.
- Support AllState technical and business team’s specific data and reporting needs on a global scale.
- Developed spark applications for data transformations and loading into HDFS using RDD, Dataframes and datasets.
- Good understanding of Data Mining and Machine Learning techniques.
- Worked on multiple clusters in managing the Data in HDFS for Data Analytics.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala. Done Clustering, regression and Classification using Machine Learning libraries Mahout, MLlib(Spark).
- Reviewing and managing Hadoop log files from multiple machines using Flume.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective &efficient Joins, Transformations and other during ingestion process.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
- Configuration of Internal load balancer, load balanced sets and AzureTraffic manager.
- Expertise in Extraction, Transformation,loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Used SQL Azure for Backend operations and data persistence.
- Implemented Elastic Search on Hive data warehouse platform.
- Involved in analyzing log data to predict the errors by using Apache Spark.
- Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR,EBS, RDS and VPC.
- Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
- Used Impala and Written Queries for fetching Data from Hive tables.
- Developed Several MapReduce jobs using Java API.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming
- Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Near Real Time Solr index on Hbase and HDFS.
- Involved in analyzing log data to predict the errors by using Apache Spark.
- Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
- Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
- Implemented the project by using Agile Methodology and Attended Scrum Meetings daily.
Confidential - New York City, NY
Spark/Big Data Developer
Responsibilities:
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Importing and exporting data into HDFS and HIVE using Sqoop.
- Responsible to manage data coming from different sources.
- Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Hive UDFs.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications
- Responsible for Spark Streaming configuration based on type of Input Source
- Experienced with AWS services to smoothly manage application in the cloud and creating or modifying instances.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Develop ETL Process usingSPARK, SCALA, HIVE and HBASE.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
- Used codec's like snappy and LZO to store data into HDFS to improve performance.
- Expert knowledge onMongoDB NoSQL data modeling, tuning, disaster recovery and backup.
- Created HBase tables to store variable data formats of data coming from different Legacy systems.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Worked on Kerberos authentication to establish a more secure network communication on the cluster.
- Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Worked with Network, database, application and BI teams to ensure data quality and availability.
- Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
- Experienced in NOSQL databases like HBase, MongoDBand experienced with Hortonworks distribution of Hadoop.
- Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1.
- Experienced in Creating ETL Mappings in Informatica.
- Experienced in working with various Transformations like Filter, Router, Expression,update strategy etc.inInformatica.
- Scheduled the ETL jobs using ESP scheduler.
- Worked in Agile methodology and actively participated in daily Scrum meetings.
Confidential, FL
Hadoop/Bi Data Engineer
Responsibilities:
- Develop Hive queries on external tables in order to perform various analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Coding complex Oracle storedprocedures, functions, packages, and cursors for the client specific applications.
- Involved in the databasemigrations to transfer data from one database to other and complete virtualization of many client applications
- Prepare Developer (Unit) Test cases and execute Developer Testing.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production systems
- Production RolloutSupport which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards,codereviews, source controlmanagement and build processes.
- Work closely with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
- Worked in a team for Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
- Involved in source system analysis, data analysis, data modelingto ETL (Extract, Transform and Load)
- Experienced in working with data analytics, web Scraping and Extraction of data in Python.
- Designed & Implemented database Cloning using Python and Built backend support for Applications using Shell scripts.
- Worked on various compression techniques like GZIP and LZO.
- Design and Implementation of Batch jobs using Sqoop, MR2, PIG and Hive.
- Implemented HBase on top of HDFS to perform real time analytics.
- Handled Avro Data files using Avro Tools and Map Reduce.
- Developed Data pipelines by using Chained Mappers.
- Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
- Active involvement in SDLC phases (Design, Development, Testing)Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Confidential
Java/Hadoop Developer
Responsibilities:
- Used Sqoop for importing and exporting data from MySql, Oracle 11g into HDFS and Hive.
- Optimizing and performance tuning of Hive Queries and Implementing Complex transformations by writing UDF's in PIG and HIVE.
- Involved in running MapReduce jobs for processing millions of records.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
- Created partitions and buckets on hive tables to improve performance while running Hive queries.
- Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
- Responsible for analyzing the performance Hive queries using Impala.
- Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
- Automated the Hadoop pipeline using Oozie and scheduled using coordinator for time frequency and data availability.
- Monitoring of Hadoop Cluster using Cloudera Manager.
- Load and transform large sets of semi-structured and unstructured data that includes sequence files and xml files and worked on Avro and Parquet file formats using compression techniques like Snappy, Gzip and Zlib.
- Worked on building Hadoop cluster in AWS Cloud on multiple EC2 instances.
- Used Amazon Simple Storage Service(S3) for storing and accessing data to Hadoop cluster.
- Used JIRA for Bit Bucket to check-in, Bug tracking and checkout code changes.
- Responsible for developing SQL Queries required for the JDBC.
- Designed the database and worked on DB2 and executed DDLS and DMLS.
- Active participation in architecture framework design and coding and test plan development.
- Strictly followed Water Fall development methodologies for implementing projects.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
- Involved in developing training presentations for developers (off shore support), QA, Production support.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Confidential
Java Developer
Responsibilities:
- Implemented several design patterns like Observer pattern,factory pattern, singleton pattern, facade pattern etc.
- Utilized Agile Methodologies to manage full life-cycle development of the project.
- Interacting with the Business Analyst and Host to understating the requirements using the Agile methodologies and SCRUM meeting to keep track and optimizing the end client needs.
- Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
- Developed the project using MVC Design pattern.
- Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
- Worked with Core Java concepts like Collections Framework, multithreading, memory management.
- Used JDBC connectivity and JDBC statements, Prepared Statements, Callable Statements for querying, inserting, updating, deleting data from Oracle databases.
- Developed Front-end Screens using HTML, CSS, and JavaScript.
- Developed Date Time Picker using Object Oriented JavaScript extensively.
- Code reviews and re-factoring was done during the development and check list is strictly adhered during development.
- Used JENKINS for continuous Integration.
- used Subversion as a version control system for the application.
- Used Log4j for logging purposes and Tracing the code.
- Client side Validations are done using JavaScript.
- Optimized XML parsers like SAX and DOM for the production data.
- Have good understanding of Teradata MPP architecture such as Partitioning and Primary Indexes.
- Good knowledge in Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery.
- Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
- Used JDBC Connections and WebSphere Connection pool for database access.
- Developed and modified several Database Procedures, Triggers and views to implement the business logic for the application.
- TOAD is used to monitor the turnaround times of queries and to test all the connections.
- Prepared the test plans and executed test cases for unit, integration and system testing.
- Developed multiple unit and integrations tests using Mockito and Easy Mock.
- Used JIRA for reporting bugs in the application.
Environment:-Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JMS, JavaScript, XSLT, HTML,CSS, SAX, DOM, XML, UML, TOAD, Mockito, Oracle, Eclipse RCP, JIRA, WebSphere, Unix/Windows.
Confidential
Junior Java Developer
Responsibilities:
- Extensive Involvement in Requirement Analysis and system implementation.
- Actively involved in SDLC phases like Analysis, Design and Development.
- Responsible for developing modules and assist in deployment as per the client’s requirements.
- Application is implemented using JSP and servlets are used for implementing Business logic.
- Developed utility and helper classes and Server side Functionalities using servlets.
- Created DAO Classes and Written Various SQL queries to perform DML Operations on the data as per the requirements.
- Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
- Developed user interface using JSP, JavaScript and CSS Technologies.
- Implemented User Session tracking in JSP.
- Involved in Designing DB Schema for the application.
- Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL.
- Worked in pair programming, Code reviewing and Debugging.
- Involved in Tool development, Testing and Bug Fixing.
- Performed unit testing for various modules.
- Involved in UAT and production deployments and support activities.