Senior Bigdata Engineer Resume
Mclean, VA
SUMMARY:
- Around 9+ years of IT experience with gathering and analyzing customer's technical requirements, development, management and maintenance projects on platforms like Hadoop, QlikView and Talend.
- Expertise in Processing 25 PB data/ 700 Nodes including Dev/Prod clusters.
- Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Job Tracker, Task Tracker, Name Node and Data Node.
- Working Experience on Snowflake Elastic data warehouse, cloud - based data-warehousing for storage and analyzing data.
- Designed and Developed Automation Framework to save money and effort in Development Tasks.
- Working Experience on Talend Components like java row, map, map, jdbc, context, base, hive and pig and big data batch processing and streaming processing components.
- Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
- Hands on experience on big data tools like Face book Presto, Apache Drill, Snowflake.
- Hands on Experience in Kafka Streaming, Scala-Spark Streaming
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Jupyter, Spyder Notebooks for creating Python scripts for data analysis and transformation using various python libraries like pandas, numpy, etc and used installed python using Anaconda Navigator.
- Efficient in writing Map Reduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Experienced in handling different file formats like Text file, Sequence files and JSON files.
- Expertise in implementing Ad-hoc queries using Hive QL.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience On Microsoft Azure Cloud and handled Data Lake AZURE, Blob and used different tools like Data Factory,Azcopy.
- Working Experience on Phoenix a massively parallel, relational database for analyzing data with Apache Hadoop.
- Implemented Data Quality, Price Gap Rules in ETL Tool Talend.
- Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
- Experienced in using Aggregate functions, table generated functions; implementing UDF's to handle complex objects.
- Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join, Merge, Update, Delete, HUE etc.
- Worked with QlikView Extensions like SVG Maps, HTML Content.
- Developed Set Analysis to provide custom functionality in QlikView application.
- Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.
- Experienced in Performance Tuning Hive queries using Hive configurable parameters.
- Developed PIG Latin scripts using operators to extract data from data files to load into HDFS.
- Experience in using Apache Sqoop to import and export data to and from (different sources) HDFS and Hive.
- Worked with different file formats like RC file, Sequence file, ORC and AVRO file format.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good at working on low-level design documents and System Specifications.
- Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
- Having good experience on Hadoop Administration like Cluster configuration, Single Node Configuration, Multi Node Configuration and Installing in distributed environment.
- Trained around 50 associates on Hadoop, QlikView and its relative components.
- Exposure to Shell Scripting for Build & Deployment Process • Worked as part of an agile team serving as a developer to customize, maintain and enhance a variety of applications for Hadoop.
- Comprehensive knowledge of Software Development Life Cycle coupled with excellent communication skills.
- Strong technical and interpersonal skills combined with great commitment towards meeting deadlines.
WORK EXPERIENCE:
Senior BigData Engineer
Confidential, McLean, VA
Responsibilities:
- Created an development framework in python which enables the developers to save time and money.
- Planning, designing, developing, testing and documenting Big Data, Data lake Solutions and Analytics.
- Responsible for improving spark performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Provide guidance on how we better integrate our data assets into single household database (vs. current plan)
- Responsible for designing and developing an automated framework which creates automates the development process in Enterprise Data Lake.
- Responsible to work with cross functional consulting teams within the data science and Analytics team to design, develop, and execute solutions to derive business insights and solve clients operational and strategic problems.
- Involved in migration of Hive queries into the Apache Nifi. Worked on Data lake AZURE, Blob.
- Work in Agile Scrum model and involved in sprint activities. Gather and Analyze Business requirements.
- Develop with GitHub, Apache Nifi, Microsoft AZURE Tools and deployed the projects in to production environments.
- Responsible for Hive Performance Queries and Improving the Efficiency and Latency of time consuming queries to save the money and time of business users.
- Wrote Apache Nifi Automated Dataflow scripts and Processors.
- Responsible for coding new development and maintaining existing systems.
- Convert project specifications into detailed instructions and logical steps for coding into languages processed by computers.
- Analyze workflow charts and diagrams, applying knowledge of requirements, analysis, design, testing and software application implementation • Apply broad knowledge of programming techniques and Big data Solutions to evaluate business user requests for new Applications. Perform further development after due analysis and design.
- Responsible for knowledge transfer/user activities.
Environment: Hadoop, HDFS, Hive, Qlik View, UNIX shell scripting, Hue, Hbase, Apache NifiMicrosoft Azure, Python, Spark.
Senior Hadoop Developer
Confidential, Springfield, VA
Responsibilities:
- Designed and Developed Data lake Enterprise layer gold confirmed process which is available for the consumption team and business users to perform analytics.
- Responsible for designing and developing an automated framework which creates automates the development process in Data lake.
- Integrated Talend with HBase for storing the processed Enterprise Data into separate column families and column qualifiers.
- Used CronTab and Zena scheduling to schedule trigger jobs in production.
- Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
- Involved in migration of Teradata queries into the snowflake Data warehouse queries.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Worked in Various Talend Integrations with Hbase and Avro Format, Hive, Phoenix and Pig Components
- Worked with GitHub, Zena, Jira, Jenkins Tools and deployed the projects in to production environments
- Involved in Cluster coordination services through Zookeeper.
- Worked On Integration with Phoenix Thick and thin clients and also involved in installing and developing Phoenix-Hive, Hive-Hbase integrations.
- Wrote UNIX Automated Shell scripts and developed an automation framework with Talend and Unix.
- Created Merge, Update, Delete Scripts in Hive and worked on performance tuning Joins in Hive.
Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Avro FormatPhoenix, Talend, Snowflake.
Hadoop Developer
Confidential, Lynchburg, VA
Responsibilities:
- Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting.
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries.
- Development of PIG scripts for Noise reduction. Developed the Sqoop scripts for the historical data with BIG tables with more than 4 TB tables.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Involved with optimizing query performance and data load times in PIG, Hive and Map Reduce applications.
- Expert in optimizing performance in hive using partitions and bucketing concepts.
- Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's.
- Experienced in optimizing hive queries, optimized joins and using different data files with Custom SerDe's.
- Designed the process to do historical/incremental load.
- Involved in Sqooping more than 20 TB of Data from Teradata to Hadoop.
Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Pig, Sqoop, Talend.
Hadoop Developer
Confidential, Staunton, VA
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Analyze the assigned user stories in JIRA (Agile software) and create design documents.
- Attend Daily stand ups and update the hours burned down in JIRA.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, Green plum Databases using Talend Tool.
- Involved in development of Talend components to validate the data quality across different data sources.
- Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend
- Exceptions thrown out of the data validation rule execution will be sent back to the business users for remediating the data and ensuring clean data across data sources.
- Worked on Global ID Tool to Apply the Business Rules.
- Automated and Schedul in the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
- Created Scheduling Process with CRON Scheduling on a weekly Process.
- Created and Maintained the Hive Tables and Green plum Tables on weekly basis.
- Collected the data from ftp server and loaded into the Hive tables.
- Partitioned the collected logs by date/timestamps and host names.
- Developed Data Quality Rules on top of the External Vendors Data.
- Imported data frequently from My SQL to HDFS using Sqoop.
- Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
- Used Qlik View for visualizing and to generate reports.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Managing and scheduling Jobs using Oozie on a Hadoop cluster.
- Involved in Data Modelling by Using Qlik View Integration of Data Sources - ETL with Qlik View reports.
- Involved in defining job flows, managing and reviewing log files.
- Monitored workload, job performance and capacity planning using Cloud era Manager.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Implemented pushing the data from Hadoop to Green plum.
- Worked on pre-processing the data using pig regular expressions.
- Gained experience with NOSQL database.
- Worked on scheduling the jobs through Resource Manager.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Talend Jobs.
Environment: Hadoop, HDFS, Hive, Qlik View, UNIX shell scripting, Hue, Green plum, Talend.
Qlik View Developer
Confidential, Elkton, VA
Responsibilities:
- Worked in Agile Scrum model and involved in sprint activities.
- Analysis of Business requirements and implementing Customer Friendly Dashboards.
- Implemented Section Access for Security Implementation
- Involved in Data Modelling by Using Qlik View Integration of Data Sources - ETL with Qlik View reports.
- Identify and improve weak areas in the applications, performance reviews and code walk through to ensure quality.
- Created QVD's and Designed Qulik View Dashboards using different types of Qlik View Objects.
- Modified ETL Scripts while loading the data, resolving loops & ambiguity joins.
- Wrote complex expressions using the Aggregation functions to match the logic with the business SQL.
- Performance tuning by analyzing and comparing the turnaround times between SQL and Qlik View.
- Worked with Qlik View Extensions like SVG Maps, HTML Content.
- Developed Set Analysis to provide custom functionality in Qlik View application.
- Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.
Environment: Hadoop, HDFS, Hive, SQL and Qlik View.
Hadoop Developer
Confidential, Reston, VA
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Created HBase tables to store various data formats ofPII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Cluster co-ordination services through Zookeeper.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Java program to extract the values from XML using XPaths.
- Experience working on processing unstructured data using Pig and Hive.
- Supported Map Reduce Programs those are running on the cluster.
- Gained experience in managing and reviewing Hadoop log files.
- End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Assisted in monitoring Hadoop cluster using tools like Cloudera Manager.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop(CDH4), Map-Reduce, H Base, Hive, Sqoop, Oozie.