Hadoop Developer/business Intelligence Data Architect Resume
SUMMARY
- 8+ years of IT experience in Software Development with 3+ years’ work experience as Big Data /Hadoop Developer with good knowledge of Hadoop framework.
- Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
- Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support (SDLC & Agile techniques).
- Keen in building knowledge on emerging technologies in the Analytics, Information Management, Big data and related areas and in providing best business solutions
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, Hbase, Oozie and Zookeeper)
- Well versed in configuring and administering the Hadoop Cluster using major Hadoop Distribution like Apache Hadoop and Cloudera.
- Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
- Analyzed large amounts of data sets writing Pig scripts and Hive queries.
- Evaluation of new technologies in Big data, Analytics and NO Sql space
- Exclusive experience in Hadoop Ecosystem and its components like HDFS, Map Reduce, Yarn, Hive, Sqoop, Kafka, Spark, Oozie, Azkaban, Airflow.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
- Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
- Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
- Expertise in Creation of Cursors, Functions, Procedure, Packages and Triggers as business requirement using PL/SQL.
- Experienced in working on RDBMS, OLAP, and OLTP concepts.
- Excellent understanding of Data modeling (Dimensional & Relational).
- Capable of organizing, coordinating and managing multiple tasks simultaneously.
- Experienced to work with multi - cultural environment with a team and also individually as per the project requirement.
- Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
- Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients.
TECHNICAL SKILLS
Hadoop/Big Data ecosystems: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper and Cloudera Manager, Zookeeper, Spark, Scala
NoSQL Database: HBase, Cassandra
Tools: and IDE: Eclipse, NetBeans, Toad, Putty, Maven, DB Visualizer, VS Code, Qlik Sense, Qlik View
Languages: SQL, PL/SQL, JAVA, Scala, Python
Databases: Oracle, SQL server, MySQL, DB2, PostgreSQL, Teradata
Tracking Tools and Control: SVN, GIT
ETL Tools: OFSAA, IBM DataStage
Cloud Technologies: AWS, Azure
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer/Business Intelligence Data Architect
Responsibilities:
- Full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
- Designed and developed automation test scripts using Python
- Creating Data Pipelines using Azure Data Factory.
- Automating the jobs using Python.
- Creating tables and loading data in Azure MySql Database
- Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
- Analyze SQL scripts and design the solution to implement using Pyspark
- Developed Spark code using Python(Pysaprk) for faster processing and testing of data.
- Used SparkAPI to perform analytics on data in Hive
- Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
- Data Cleansing, Integration and Transformation using PIG
- Involved in exporting and importing data from local file system and RDBMS to HDFS
- Designing and coding the pattern for inserting data into Data lake.
- Moving the data from On-Prem HDP clusters to Azure
- Building, installing, upgrading or migrating petabyte size big data systems
- Fixing Data related issues
- Loading data to DB2 data base using Data Stage.
- Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
- Created Hive tables, and loading and analyzing data using hive queries
- Communicating regularly with the business teams to ensure that any gaps between business requirements and technical requirements are resolved.
- Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
- Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
- Engage in project planning and delivering to commitments.
- POC’s on new technologies(Snowflake) that are available in the market to determine the best suitable one for the Organization needs
Confidential
Data Warehouse Architect - Hadoop/SQL Developer
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Moving data from Oracle to HDFS using Sqoop
- Data profiling on critical tables from time to time to check for the abnormalities
- Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables
- Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Designed and developed automation test scripts using Python
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Implemented HiveGenericUDF's to incorporate business logic into HiveQueries.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
- Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
- Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Creating Hive tables and working on them using Hive QL.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
Confidential
Hadoop Analyst
Responsibilities:
- Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
- Developing Managed, external and partition tables as per the requirement.
- Ingested structured data into appropriate schemas and tables to support the rule and analytics.
- Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
- Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting
- Implemented scripts for loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
- Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
- Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
- Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
- Generated POJO classes to map to the database table.
- Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data.
- ORM tool Hibernate to represent entities and fetching strategies for optimization.
- Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
- Written SQL queries and stored procedures for the application to communicate with Database
- Used Junit framework for unit testing of application.
- Used Maven to build and deploy the application.
Confidential
Java Developer
Responsibilities:
- Participated in gathering business requirements, analyzing the project and creating use Cases and Class Diagrams.
- Interacted coordinated with the Design team, Business analyst and end users of the system.
- Created sequence diagrams, collaboration diagrams, class diagrams, use cases and activity diagrams using Rational Rose for the Configuration, Cache & logging Services.
- Implementing Tiles based framework to present the layouts to the user. Created the WebUI using Struts, JSP, Servlets and Custom tags.
- Designed and developed Caching and Logging service using Singleton pattern, Log4j.
- Coded different action classes in struts responsible for maintaining deployment descriptors like struts-config, ejb-jar and web.xml using XML.
- Used JSP, JavaScript, Custom Tag libraries, Tiles and Validations provided by struts framework.
- Wrote authentication and authorization classes and manage it in the front controller for all the users according to their entitlements.
- Developed and deployed Session Beans and Entity Beans for database updates.
- Implemented caching techniques, wrote POJO classes for storing data and DAO’s to retrieve the data and did other database configurations using EJB 3.0.
- Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
- Used Struts-Validator frame-work for all front-end Validations for all the form entries.
- Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
- Design and development of JAXB components for transfer objects.
- Prepared EJB deployment descriptors using XML.
- Involved in Configuration and Usage of Apache Log4J for logging and debugging purposes.
- Wrote Action Classes to service the requests from the UI, populate business objects & invoke EJBs.
- Used JAXP (DOM, XSLT), XSD for XML data generation and presentation