- Overall 8 years of IT experience in a variety of industries, which includes 3+ years of work experience in Big data Analytics and development with good knowledge on Hadoop Framework, Hadoop and parallel processing implementation.
- Experience in Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, Sqoop, YARN and AWS
- Proficient experience in all phases of software Engineering including Analysis, Design, Coding, Testing and Implementation as well as Agile Methodologies.
- Experience in Cloudera and Hortonworks Distributions.
- Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.
- Proficient in developing data transformation and other analytical applications in Spark, Spark-SQL using Python programming language (PySpark).
- Familiar working on various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Good Knowledge of OOPS concepts and Design patterns.
- Involved in preparing ETL mapping specification documents and Transformation rules for the mapping.
- Good understanding of dimensional modeling (Star schema and Snowflake schema, SCD types- 1, 2, 3), and data modeling (star schema) at logical and physical level.
- Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS EMR Clusters, EC2, S3, etc.
- Hands on Experience on Spark streaming connecting to Kafka cluster.
- Architected, Designed and maintained high performing ETL Process.
- Hands on experience in using automatic build tools MAVEN and Jenkins.
- Worked on version control tools like Bit-Bucket, GIT, SVN.
- Experienced with different scripting language like Python and shell scripts.
- Good experience with SQL, PL/SQL and database concepts
- Exceptional ability to learn new technologies and to deliver outputs in short deadlines.
- Team player with good interpersonal and problem-solving skills, ability to work in team and work independently.
Big Data Ecosystems: Hadoop, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Kafka, Hue, Cloudera, Horton works, oozie and airflow.
Spark Technologies: Spark SQL, Spark Data frames and RDD
Scripting Languages: Python and shell scripting
Programming Languages: Java, python, SQL, PL/SQL
Cloud Technologies: AWS EMR, EC2 and s3
Databases: Oracle 12c, MySQL and Microsoft SQL Server
NoSQL Technologies: HBase
BI tools: Tableau, Kibana
Web Technologies: HTML, CSS, XML, SOAP, and REST.
Development Tools: Eclipse, PyCharm, Git, ANT, Maven, Jenkins, Bamboo, SOAP UI, QC, Jira, Bugzilla
Methodologies: Agile /Scrum, Waterfall
Operating Systems: Windows X/7/8/10, UNIX, LINUX.
Confidential, Pleasanton, CA
Big data Developer
- Actively worked with business client SME’s to gather requirements for project planning and development.
- Coordinate with Landing Zone SME’s while performing the SDLC Phases.
- Collaborate with source team to introspect the details by referring the technical specification document.
- Involved in Designing and developing ingestion and refinement frameworks.
- Involved in Designing the incremental approach based on the specific requirement of the use case.
- Ingesting the data from various sources like ORACLE, DB2 on to HDFS raw zone using the Sqoop as part of ingest framework.
- Optimized the Sqoop ingestion jobs by analyzing mappers, and degree of parallelism.
- Performing data cleansing and Data manipulations over the Data received in the form of Flat Files.
- Handled and ingested data in the form of Flat Files by using SFTP, BCP operations.
- Perform Data Ingestion in different file formats like Avro, parquet and ORC.
- Have Integrated the scripts to address various databases and its ingestion process.
- Develop Test cases and validated the ingested files on HDFS raw zone.
- Fine tuning Hive queries for better performance outcomes.
- Develop Hive scripts using Spark SQL to de-normalize and aggregate the data on refinement jobs.
- Performed the Data manipulation using the Spark Data frames.
- Optimized the Hive queries in refinement by considering the partition and bucketing when and where required.
- Handled the skewness in hive view query by using random function and reduced the execution time from 6 hrs. to 1 hr.
- Optimized the Spark-SQL jobs by repartitioning the data and tuning the memory parameters.
- Developed spark udf’s using python to calculate percentage calculations.
- Reduced the hive query execution time from 5 hrs. to 1.5 hours by performing operations on partition table.
- Analyze and fix production issues raised by business team.
- Provide Production support for use case that are live in PROD Environment
- Used IBM Tivoli Work Scheduler and crontab to execute the data workflows.
- Experience working with GIT versioning and Jenkins to build projects.
Confidential, San Jose, CA
Big Data Engineer
- Developed multiple jobs in Pig for data cleaning and processing.
- Implemented the custom Incremental logic in loading the payload data using PIG.
- Performed various Joins on staging tables in handling the insert and update records.
- Used Avro file format in Pig Latin to load and Store data.
- Developed Pig scripts to convert the data from Avro to Text file format.
- Create the Hive external tables for analytical querying on the data present in HDFS.
- Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created custom scripts in validating the data present in HDFS.
- Monitor and Analyze the yarn application logs.
Confidential, St Louis, MO
Big Data Engineer
- Developed data pipeline using Spark, Hive, on amazon EMR clusters.
- Developed Spark code in pulling the data present in S3 buckets for faster data processing.
- Ingested data from S3 and analyzed the data using Spark (Data frames and Spark-SQL), and series of Hive scripts to produce summarized results to downstream systems.
- Worked on Airflow in creating the workflow orchestration of entire data pipeline.
- Created the required airflow Dags by using the various airflow operators necessary in orchestrating the workflow.
- Developed python application in pulling the data from project related Rest API’s.
- Dashboarding using Kibana over the down streamed data present in S3 buckets. (Kibana on AWS).
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) ln EC2.
- Developed a File watcher application to deal with data ingestion over the data present in S3.
- Developed Confidential MFP analytics reporting application using HIVE.
- Developed Hadoop based solution for customer-care calls prediction.
- Developed shell scripts for cleaning, validating and transforming the data.
- Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to the downstream data on to DataMart’s
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Scheduled data extracts on a daily basis.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Participated in requirement gathering and converting the requirements into technical specifications.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database.
- Involved in performance tuning of T-SQL queries.
- Data migration (import & export/BCP) from Text to SQL Server.
- Generating ad-hoc reports using MS-Excel and Crystal Reports.
- Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart.
- Test the Web services and WSDL using the SOAP UI.