- Over 8 years of IT experience in Analysis, Design, Implementation, Development, Maintenance and test large scale applications using SQL, Hadoop, and other Big Data technologies.
- 4 + Years experience in developing large scale applications using Hadoop and Other Big Data tools.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Hands on experiences in Hadoop, Eco - system components like HDFS, MapReduce, Cloudera, (MRV1, YARN), Pig, Hive, HBase, Sqoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Python and Scala.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera and Hortonworks distribution of Hadoop.
- Experience in using new open-source technologies like apache KYLO for self-service and KYLIN for building OLAP cubes.
- Experience working with AWS Stack (S3, EMR, EC2, SQS and RedShift).
- Experience working with Elasticsearch, Logstash and Kibana .
- Experience in developing solutions to analyze large data sets efficiently.
- Experience in using Apache Kafka for collecting, aggregating and moving large amounts of data.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
- Experience in data analysis using Hive, Pig Latin, Impala.
- Created Hive, Pig, SQL and HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using Kafka and Sqoop.
- Perform structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau).
- Used Amazon Lambda for developing API to manage servers and run the code in AWS.
- Experience working with using ansible for automating deployment process.
- Experience in scheduling and monitoring jobs using Oozie, Airflow and Zookeeper.
- Experience with database SQL and NoSQL (HBase and Cassandra.)
- Developed Spark scripts to import large files from AWS S3 buckets.
- Extensively used Informatica Power Center in end-to-end of Data warehousing ETL routines, which includes writing custom scripts and loading data from flat files.
- Experience working with Java, J2EE, Spring and Hibernate.
- Good Experience in giving Production Support to the applications.
- Involved in the Complete Software development life cycle (SDLC) to develop the application using Agile and Waterfall methodologies.
- Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills.
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Spark, Spark Streaming and Kafka
Hadoop Distribution: Horton Works, Cloudera and EMR.
No SQL Database: HBase and Cassandra.
Scripting Languages: Shell, Python, Perl.
Operating Systems: UNIX, LINUX, Ubuntu, Windows Vista/7/8/10
Programming Languages: Python, C, SQL, PL/SQL, HQL, Hive, Pig, and HBase.
IDE Dev. Tools: Eclipse, SOAP UI, Ant, Maven, PyCharm, and Jenkins.
Java Technologies: Servlets, JDBC, Spring, Hibernate, SOAP/REST services
Databases: Oracle, SQL Server and MySQL.
Frameworks: Spring, Hibernate and Struts.
Web Services: SOAP, JMS, Apache Tomcat, WebLogic, JBOSS, Apache HTTP Server.
Methodologies: OOAD, UML, Design Patterns.
Confidential, Minnesota, MN
Big Data Engineer
- Review Source to transformation mapping (STM) documents to understand the functionality and requirements and involved in gathering business requirements.
- Hands on experience in designing, developing, and maintaining software solutions in Hadoop cluster.
- Designed data flow to pull the data using Rest API from a third-party Vendor using OAUTH authentication.
- Developed data pipeline using Spark, Hive, Pig and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in HDFS.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
- Worked on Apache Spark for Incremental merge process by converting the data to key-value pairs.
- Worked on AWS SQS to consume the data from S3 buckets.
- Experienced with Spark Streaming to ingest data into Spark Engine.
- Imported the data from different sources like AWS S3, Local file system into Spark RDD.
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
- Involved in using Sqoop for importing and exporting data between RDBMS and HDFS.
- Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
- Worked on moving the data pipelines from CDH to run on AWS EMR.
- Worked with different execution engines like Spark, MapReduce and TEZ.
- Used Airflow for scheduling the Hive, Spark and MapReduce jobs.
- Good experience in testing different data pipelines and ensured to have the highest data quality.
- Used Impala and Presto for querying the datasets.
- Skilled in developing applications in Python language for multiple platforms and good experience in handling data manipulation using python Scripts.
- Worked on Elastic search engine and Logstash for ingesting the logs using Amazon kinesis Data firehouse and Kibana UI for searching the logs to find out the failures of the jobs and gather other key metrics.
- Worked on Ingestion, Parsing and loading the data from CSV and JSON files using Hive and Spark.
- Worked on building aggregate tables using Hive and Spark which feeds Dashboards to satisfy different Business KPI’s.
- Created Ansible playbook to automate deploying airflow on multiple EC2 instances.
- Working on a POC on self-service tool apache KYLO with integrated metadata management, governance, security for performing data ingest and data preparation.
- Created Dashboards and sets on data using Tableau for business decision purpose and estimating the sales on location bases.
- Primary contributor in designing, coding, testing, debugging, documenting and supporting all types of applications consistent with established specifications and business requirements to deliver business value.
- Demonstrated Hadoop best practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
Environment: Hive, Pig, Spark SQL, Spark, EMR, HBase, Sqoop, SQS, S3, Cloudera, HUE, Python, IDE(Eclipse), Scala, Maven, HDFS, Jenkins.
Confidential, O'Fallon, Missouri
- Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
- Gathered system design requirements, design and write system specifications.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Worked on Cloudera distribution of Hadoop (CDH) 5.3.0.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Designed workflows with many sessions with decision, assignment task, event wait, and event raise tasks.
- Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
- Designed and developed UNIX shell scripts as part of the ETL process to compare control totals, automate the process of loading, pulling and pushing data from and to different servers.
- Used Hive for ETL which involved static and dynamic partitions.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Used window functions in Spark and Hive to perform complex data calculations to satisfy the Business requirements.
- Written multiple Hive UDFS using Core Java and OOP concepts and spark functions with in Python programs.
- Experienced in working with different scripting technologies like Python, Unix shell scripts.
- Mentored analyst and test team for writing Hive Queries.
- Stored the data in Avro, Parquet and ORC file formats and used snappy and LZO for compressions.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
- Developed Spark scripts by using Python and Scala shell commands as per the requirement.
- Worked on Integrating Hive-HBase tables and on top of the HBase tables search engine application are built.
- Experience in data processing like collecting, aggregating, moving the data using Apache Kafka .
- Used Kafka to load data into HDFS and move data back to S3 after data processing
- Implemented Data Interface to get information of customers using RESTful API and Pre-Process data.
- Used apache Kylin for building OLAP cubes on top of data in Hadoop/hive and store data in HBase for sub- second query latency.
- Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Hive, Pig, Flume, Sqoop, Spark, Scala, MapReduce, Cloudera, Avro, Parquet, Snappy, Zookeeper, CDH, Kafka, NoSQL, HBase, Java (JDK 1.6), Eclipse, Python, MySQL.
Confidential, Dallas, Texas.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to adopt them to benefit from the Big Data Hadoop Initiative.
- Wrote complex SQL queries, PL/SQL stored procedures and convert them to ETL tasks.
- Created and maintained documents related to business processes, mapping design, data profiles and tools.
- Experienced in managing and reviewing Hadoop log files.
- Developed Hive queries and UDFS to analyze/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in Hive.
- Worked on writing complex Hive queries and Spark scripts.
- Moving data from Oracle to HDFS and vice-versa using Sqoop to supply the data for Business users.
- Wrote Spark jobs with RDD's, Pair RDDs, Transformations and actions, data frames for data transformations from relational sets.
- Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs using Java.
- Integrated data quality plans as a part of ETL processes.
- Responsible for developing Pig Latin scripts for extracting data using JSON Reader function.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
- Written Pig Scripts for sorting, joining, filtering and grouping data.
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Involved in knowledge sharing sessions with teams.
- Implemented test scripts to support test driven development and continuous integration.
- Worked with loading and transforming large sets of structured, semi structured and unstructured data.
- Worked with Spark using Scala and Spark SQL for faster testing and processing of data.
- Used Hue browser for interacting with Hadoop components.
- Worked on Hortonworks distribution of Hadoop and used Ambari to monitor the cluster health.
Environment: Hadoop, Hive, GitHub, Spark, Pig, Tableau, Map Reduce, Sqoop, HDP, Python, Shell Scripting, Linux, Oozie.
- Involved in requirement analysis, design, development, testing, documentations.
- Worked closely with the SMEs for knowledge transition of existing People central systems.
- Created Technical design documents to listing the extract, transform and load techniques and business rules.
- Interact with Business analysts to translate any new business requirements into technical specifications.
- Worked as a developer in developing new procedures, triggers, functions and changes to the existing PLSQL procedures and packages as per the requirements.
- Extensively involved in writing SQL queries (sub queries and join conditions) for building and testing ETL processes.
- Created tables, indexes, views, constraints, sequences, triggers, synonyms, table spaces, nested tables, database links using SQL and PL/SQL.
- Developing data marts for different vendors using PL/SQL blocks and SQL queries by joining the dimension tables and lookup tables.
- Building data warehouse by integrating the data marts using complex queries.
- Improving the quality of application by using Performance Tuning concepts.
- Using Informatica Designer, developed mappings, which populated the data into the target.
- Used Source Analyzer and Warehouse Designer to import the source and target database schemas and the mapping designer to map the sources to the targets.
- Assisted in building the ETL source to Target specification documents by understanding the business requirements.
- Worked closely with DBA’s to migrate the Oracle database sync between Primary and secondary Datacenters from SharePlex to Data Guard and perform necessary validation’s.
- Participated in Disaster Recovery testing process between prod primary database and BCP databases.
- Developed mappings that perform Extraction, Transformation and load of source data into data mart using various power center transformations such as Source Qualifier, Aggregator, Filter, Router, Sequence Generator, look up, Rank, Joiner, Expression, Stored Procedure, SQL, Normalizer
- Responsibilities included designing and developing complex Informatica mappings including Type-II slowly changing dimensions.
- Skills Performing transformations like Merge, Sort and Update to get the data in required format.
- Automated the Informatica jobs using UNIX shell scripting.
- Developed standard and re-usable procedures and functions.
- Involved in Performance tuning using methods like creating indexes, adding hints and remove the joins or columns that were not necessary.
- Unit Test the deliverables and prepare the Test Result Summary.
- Prepare unit and integration test plans, code review, testing.
Environment: Oracle 10g, 11g, SQL Loader Utility, Informatica Power Center, Windows
- Generated the classes and interfaces from the designed UML sequence diagrams and coded as per those plans along with the team.
- Developed user interfaces for policy owner module using JSPs, Struts Tag Library framework.
- Developed necessary DAOs (Data Access Objects) for policy owner module.
- Application was built on MVC architecture with JSP, CSS, HTML and DHTML acting as presentation layer, Struts framework as business layer.
- Performed client side and server-side validations using Struts validations.
- Created Action Form beans and performed validation by configuring the Struts validator.
- Used Dispatch Action to group related actions into a single class.
- Used multithreading in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
- Created database tables for the application on an Oracle database.
- Created data model, SQL scripts for the application.
- Build the applications using ANT tool. Also, used Eclipse as the IDE.
- Actively involved in testing, debugging and deployment of the application on WebLogic Application server.
- Developed test cases and performed unit testing using JUnit.
- Participated in Design Session and Design Reviews.
- Provided Production Support on a weekly on-call rotation basis.