Hadoop Developer/Business Intelligence Data Architect Resume

SUMMARY

8+ years of IT experience in Software Development with 3+ years’ work experience as Big Data /Hadoop Developer with good knowledge of Hadoop framework.
Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support (SDLC & Agile techniques).
Keen in building knowledge on emerging technologies in the Analytics, Information Management, Big data and related areas and in providing best business solutions
Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, Hbase, Oozie and Zookeeper)
Well versed in configuring and administering the Hadoop Cluster using major Hadoop Distribution like Apache Hadoop and Cloudera.
Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
Analyzed large amounts of data sets writing Pig scripts and Hive queries.
Evaluation of new technologies in Big data, Analytics and NO Sql space
Exclusive experience in Hadoop Ecosystem and its components like HDFS, Map Reduce, Yarn, Hive, Sqoop, Kafka, Spark, Oozie, Azkaban, Airflow.
Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
Expertise in Creation of Cursors, Functions, Procedure, Packages and Triggers as business requirement using PL/SQL.
Experienced in working on RDBMS, OLAP, and OLTP concepts.
Excellent understanding of Data modeling (Dimensional & Relational).
Capable of organizing, coordinating and managing multiple tasks simultaneously.
Experienced to work with multi - cultural environment with a team and also individually as per the project requirement.
Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients.

TECHNICAL SKILLS

Hadoop/Big Data ecosystems: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper and Cloudera Manager, Zookeeper, Spark, Scala

NoSQL Database: HBase, Cassandra

Tools: and IDE: Eclipse, NetBeans, Toad, Putty, Maven, DB Visualizer, VS Code, Qlik Sense, Qlik View

Languages: SQL, PL/SQL, JAVA, Scala, Python

Databases: Oracle, SQL server, MySQL, DB2, PostgreSQL, Teradata

Tracking Tools and Control: SVN, GIT

ETL Tools: OFSAA, IBM DataStage

Cloud Technologies: AWS, Azure

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer/Business Intelligence Data Architect

Responsibilities:

Full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
Designed and developed automation test scripts using Python
Creating Data Pipelines using Azure Data Factory.
Automating the jobs using Python.
Creating tables and loading data in Azure MySql Database
Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
Analyze SQL scripts and design the solution to implement using Pyspark
Developed Spark code using Python(Pysaprk) for faster processing and testing of data.
Used SparkAPI to perform analytics on data in Hive
Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
Data Cleansing, Integration and Transformation using PIG
Involved in exporting and importing data from local file system and RDBMS to HDFS
Designing and coding the pattern for inserting data into Data lake.
Moving the data from On-Prem HDP clusters to Azure
Building, installing, upgrading or migrating petabyte size big data systems
Fixing Data related issues
Loading data to DB2 data base using Data Stage.
Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
Created Hive tables, and loading and analyzing data using hive queries
Communicating regularly with the business teams to ensure that any gaps between business requirements and technical requirements are resolved.
Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
Engage in project planning and delivering to commitments.
POC’s on new technologies(Snowflake) that are available in the market to determine the best suitable one for the Organization needs

Confidential

Data Warehouse Architect - Hadoop/SQL Developer

Responsibilities:

Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Moving data from Oracle to HDFS using Sqoop
Data profiling on critical tables from time to time to check for the abnormalities
Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
Created and worked Sqoop jobs with incremental load to populate Hive External tables
Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
Designed and developed automation test scripts using Python
Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
Analyzed the SQL scripts and designed the solution to implement using Pyspark
Implemented HiveGenericUDF's to incorporate business logic into HiveQueries.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Creating Hive tables and working on them using Hive QL.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and
Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.

Confidential

Hadoop Analyst

Responsibilities:

Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
Developing Managed, external and partition tables as per the requirement.
Ingested structured data into appropriate schemas and tables to support the rule and analytics.
Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from edge node to HDFS using shell scripting
Implemented scripts for loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
Generated POJO classes to map to the database table.
Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data.
ORM tool Hibernate to represent entities and fetching strategies for optimization.
Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
Written SQL queries and stored procedures for the application to communicate with Database
Used Junit framework for unit testing of application.
Used Maven to build and deploy the application.

Confidential

Java Developer

Responsibilities:

Participated in gathering business requirements, analyzing the project and creating use Cases and Class Diagrams.
Interacted coordinated with the Design team, Business analyst and end users of the system.
Created sequence diagrams, collaboration diagrams, class diagrams, use cases and activity diagrams using Rational Rose for the Configuration, Cache & logging Services.
Implementing Tiles based framework to present the layouts to the user. Created the WebUI using Struts, JSP, Servlets and Custom tags.
Designed and developed Caching and Logging service using Singleton pattern, Log4j.
Coded different action classes in struts responsible for maintaining deployment descriptors like struts-config, ejb-jar and web.xml using XML.
Used JSP, JavaScript, Custom Tag libraries, Tiles and Validations provided by struts framework.
Wrote authentication and authorization classes and manage it in the front controller for all the users according to their entitlements.
Developed and deployed Session Beans and Entity Beans for database updates.
Implemented caching techniques, wrote POJO classes for storing data and DAO’s to retrieve the data and did other database configurations using EJB 3.0.
Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
Used Struts-Validator frame-work for all front-end Validations for all the form entries.
Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
Design and development of JAXB components for transfer objects.
Prepared EJB deployment descriptors using XML.
Involved in Configuration and Usage of Apache Log4J for logging and debugging purposes.
Wrote Action Classes to service the requests from the UI, populate business objects & invoke EJBs.
Used JAXP (DOM, XSLT), XSD for XML data generation and presentation

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship