Big Data Hadoop Developer Resume
Fort Worth, TexaS
SUMMARY
- 4 years of experience on Bigdata engineering and Analytics using Hadoop working environment includesMap Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Oozie, and Sqoop.
- Experienced in processing large datasets of different forms including structured, semi - structured and unstructured data.
- Hands on experience with Cloudera and multi cluster nodes on Hortonworks Sandbox.
- Expertise at designing tables in Hive, Mysql using Sqoop and processing data like importing and exporting of databases to the HDFS.
- Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of Hadoop, data modeling, machine learning and advanced data processing.
- Experience optimizing ETL workflows, where data coming from different sources and it is processed.
- ETL Data extraction, managing, aggressions and loading into HBase.
- Expertise in developing Pig Latin Script and Hive Query Language.
- Extensive knowledge about Zookeeper process for Various types of centralized configurations
- Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets and Text files.
- Experience in managing and reviewing Hadoop Log files using Flume and Kafka also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Hands on experience with Spark to handle the streaming data.
- Hands on experience with spring tool suit for development of Scala Applications.
- Shell Scripting to load the data and process it from various Enterprise Resource Planning (ERP) sources.
- Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
- Expertise in Hadoop components like Yarn, Pig, Hive, HBase, Flume, Oozie, Shell Scripting like Bash.
- Good Understanding of Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
- Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Hands on experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Working knowledge of Agile and waterfall development models.
TECHNICAL SKILLS
Big Data Platform: Hortonworks (HDP 2.2)/AWS (S3, EMR, EC2)/Cloudera (CDH3/CDH4)
Analytical Tools: D3JS, Tableau, R, Python
OLAP Concepts: Data warehousing, Data mining concepts
Apache Hadoop: HDFS, HBase, Pig, Hive, Sqoop, Kafka, Zookeeper, Oozie, Ambari, Spark SQL
Source Control: GitHub, VSS, TFS
Databases and NoSQL: MS SQL Server 2012/2008, Oracle 11g (PL/SQL) and MySQL 5.6, MongoDB
Development Methodologies: Agile and Waterfall
Development Tool: Eclipse, Toad, Visual Studio
Programming Languages: Java, .Net
Scripting Languages: JavaScript, JSP, Python, XML, HTML and Bash
PROFESSIONAL EXPERIENCE
Confidential, Fort Worth, Texas
Big Data Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Developed a mechanism that manipulates a huge 7GB csv file into Hive and exports results into many csv files
- Used Oozie workflow to automate the process
- Observed rapid increase in the speed compared to the traditional ETL process
- Extracted and loaded data into Data Lake environment (Amazon S3) which was accessed by business users and data scientists.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Handled importing of data from various data sources, performed transformations using Hive, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop
- Worked extensively with Sqoop for importing metadata from Oracle
- Experience migrating MapReduce programs into Spark transformations using Spark and Scala
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS
- Optimized PIG jobs by using different compression techniques and performance enhancers.
- Optimization of complex joins in PIG by using techniques such as skewed joins and hash based aggregations.
- Worked on the Data Pipeline which is an orchestration tool for all our jobs that run on AWS
- Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud
- Written shell scripts and Python scripts for automation of job
- Assist with the addition of Hadoop processing to the IT infrastructure
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Pig, Sqoop, Oozie, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Oozie, HBase.
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Imported data from different relational data sources like RDBMS, Teradata to HDFS using Sqoop.
- Imported bulk data into HBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest ApI to Access HBase data to perform analytics.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive and impala.
- Experienced in running query usingImpalaand used BI tools to run ad-hoc queries directly on Hadoop.
- Experienced with batch processing of data sources using ApacheSpark,Elastic search.
- Develop wrapper using shell scripting for Hive, Pig, Sqoop, Scala jobs.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Developed java Restful web services to upload data from local to Amazon S3, listing S3 objects and file manipulation operations.
- Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for input and output.
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked on different file formats like Sequence files, XML files and Map files usingMap ReducePrograms.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: CDH, Map Reduce, Hive, Spark, Oozie, Sqoop, Pig, Java, Rest API, Maven, MRUnit, Junit, Tableau, Cloudera, Python.
Confidential, Boca Raton, FL
SQL BI Developer
Responsibilities:
- Responsible for interaction with business users, gathering requirements and managing the delivery
- Involved in installation, Configuration and administration of Tableau Server
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server
- Tracking the performance of sales representatives with respective KPIs
- PreparingDashboardsusing calculations, parameters, calculatedfields,groups, sets and hierarchies in Tableau
- Published Workbooks by creating user filters so that only appropriate users can view/edit them
- Creating users, sites, groups, projects, Data connections, settings as a tableau administrator
- Created rich dashboards using Tableau Dashboard and prepared user stories to create compelling dashboards to deliver actionable insights
- Authenticating the tableau dashboards before publishing to tableau server
- Security implementation to tableau server reports based on user community
- Generated Dashboards with Quick filters, Parameters and sets to handle views more efficiently
- Published Workbooks by creating user filters so that only appropriate teams can view it
- Analyzed the source data and handled efficiently by modifying the data types
- Generated context filters and used performance actions while handling huge volume of data
- Developed tableau dashboards and checked the performance using Performance Recording
- Generated tableau dashboards with combination charts for clear understanding.
- Analyzed backend data from SQL Server and Oracle create effective dashboards.
- Created Tableau Calculations (Table Calculations, Hide Columns, Creating/Using Parameter, Totals) and Formatting (Annotations, Layout Containers, Mark Labels, Rich Text Formatting) using Tableau Desktop
- Created Workbooks and Projects, Database Views, Data Sources and Data Connections
- Performed the back up and restoration activity, server performance monitoring of Tableau server for environmental changes, or for any software upgrades or patches
Environment: s: Oracle,SQL Server Integration Services, Transact-SQL, Tableau, Excel Charts
Confidential, Columbia, MD
SQL BI Developer
Responsibilities:
- Worked with client to understand and analyze business requirements to provide the possible technical solutions.
- Review and modify software programs to ensure technical accuracy & reliability of programs.
- Translate business requirements into software applications and models.
- Worked with database objects such as tables, views, synonyms, sequences and database links as well as custom packages tailored to business requirements.
- Built complex queries using SQL and wrote stored procedures using PL/SQL.
- Used Bulk Collections, Indexes, and Materialized Views to improve the query executions.
- Ensure compliance of standards and conventions in developing programs.
- Created SQL scripts for conversion of legacy data (including validations) and then load it into the tables.
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, Joins and Other SQL code to implement business rules.
- Performed System Acceptance Testing (SAT) and User Acceptance Testing (UAT) for databases. Performed unit testing and path testing for application.
- Analyzed available data from MS Excel, MS Access and SQL server.
- Involved in implementing the data integrity validation checks.
- Resolve and troubleshoot complex issues.
Environment: MS SQL Server 2005/2008, Windows 2003/2008, SSIS, SQL Server, Management Studio, SQL Server Business Intelligence studio, SQL Profiler, Microsoft Excel and Access.