Hadoop Spark Developer Resume
Dallas, TexaS
PROFILE SUMMARY:
- Nearly 5 years of professional experience working with data,which includes hands on experience of 3 years in analysis, design, development and maintenance of Hadoop and Java based applications.
- Expertise in understanding of Hadoop Architecture and various components such as Flume and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Extensive experience of development using Hadoop ecosystem covering Map Reduce, HDFS, YARN, Hive, Impala, Pig, Hbase, Spark, Sqoop, Oozie, Cloudera.
- Hands on experience in writing Map Reduce jobs using Java.
- Having good knowledge of spark.
- Strong experience in analyzing data using HiveQL, SparkSQL, HBase and custom Map Reduce programs.
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Working knowledge of Scala to use spark.
- Strong problem - solving, organizing, team management, communication and planning skills, with ability to work in team environment. Ability to write clear, well-documented, well-commented and efficient code as per the requirement.
- Capable of processing large sets of structured, Semi-structured and unstructured data and supporting systems application architecture.
- Excellent Knowledge on Hadoop ecosystems such as HDFS, Job tracker, Task Tracker, Name Node, Data Node and Map Reduce Programming paradigm
- Good Knowledge on NoSql databases like Hbase.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Spark, Oozie, Flume,Zookeeper,Yarn
Programming Languages: Java, C/C++, Scala
Scripting Languages: JavaScript, HTML
Tools: Eclipse, SSIS, SSRS, SSAS, Tableau, Apache Spark, Netbeans
Databases: Microsoft SQL Server, Oracle, Microsoft Access
PROFESSIONAL EXPERIENCE
Confidential, Dallas, Texas
Hadoop Spark Developer
Responsibilities:
- Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
- Used Spark RDDs, Data frames and Datasets for data transformation and processing.
- Performed conversion of Hive/SQL queries into Spark SQL for better performance.
- Developed Spark jobs using Scala for Batch analysis as per business requirement.
- Use of Sqoop to import and export data from Oracle RDBMStoHDFSand vice-versa.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Created Managed and External Hive tables with static/dynamic partitioning.
- Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them.
- Optimized the Hive queries by setting different combinations of Hive parameters.
- Developed UDF’s(User Defined Functions)to extend core functionality of PIG and HIVE queries as per requirement.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Implemented workflow using Oozie for running Map Reduce jobs and Hive Queries.
Environment: Hadoop, EMR, Amazon S3, Redshift, Ambari, HDFS, Hive,Impala, Spark, Scala, Python, Pig, Sqoop, Oozie, GIT,Oracle, DB2, MySQL, UNIX Shell Scripting, JDBC.
Confidential, Chicago, IL
Big Data Developer
Responsibilities:
- Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
- Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDF's) to pre-process data for analysis
- Used Pig for data cleansing and extracting the data from the web server output files to load into HDFS.
- Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
- Supported Map Reduce Programs that are running on the cluster.
- Created Hive schemas using performance techniques like partitioning and bucketing
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance
- Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Developed MapReduce programs in Java for Data Analysis and loaded data from various data sources into HDFS.
- Worked on large sets of structured, semi-structured and unstructured data.
- Created Hbase tables to load large datasets from different databases, while comparing it with hive and pig.
- Wrote multiple queries to pull data from Hbase.
Environment: Apache Hadoop 0.20.203, Cloudera Manager (CDH4), HDFS, Java MapReduce, Eclipse, Hive, Pig, Sqoop, Oozie and SQL, Oracle 11g.
Confidential
Business Intelligence Developer
Responsibilities:
- Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying the data sources required for the request.
- Used Joins like INNER, OUTTER JOINS, UNION AND UNION ALL while creating tables.
- DesignedSSIS packages to extract data from different sources like Oracle, MS Excel, and MS Access load into Data Warehouse using SSIS.
- Developed SSIS Package for cleaning and scrubbing of data using Lookup and Fuzzy Lookup transformation to validate the data.
- Used SSIS and T-SQL to transfer data from various sources to staging area and finally transfer into data warehouse.
- Loaded Excel sheets into staging tables by using Bulk Copy Program and SSIS packages.
- Developing, maintaining, troubleshooting and updating SSRS Reports.
- Manipulating, cleansing & processing data using Excel, Access and SQL.
- Responsible for loading, extracting and validation of client data.
- Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
- Analyzing raw data, drawing conclusions & developing recommendations.
- Writing T-SQL scripts to manipulate data for data loads and extracts.
- Wrote SQL Stored Procedures and Views, and coordinate and perform in-depth testing of new and existing systems.
- Designed and Developed Database Objects like Tables, User Defined data types, function, Views, Stored Procedures, Clustered, Non-Clustered Indexes and Constraints to the tables in data Warehouse.
- Used various data flow transformation in packages based on the business need such as Data conversion, condition split, multicast, union all merge, merge join, sort, etc.
- Successfully transferred old data from various sources like flat files, MS Access, and Excel into MS SQL Server 2005 using SSIS Packages.
- Create different type of reports including Cross-tab, Conditional, Drill-down, Sub reports also parameterized reports.
- Created Packages in SSIS by using different data Transformations like Derived column, Lookup, Conditional Split, Merge Join, Sort and Execute SQL Task to load data into Database.
- Generated different type of reports like cascading reports,Drill Down, Conditional, Cross Tab and parameterized reports in SSRS.
- Manipulating, cleansing, processing data using Excel, Access and SQL.
- Identified client requirement and based on that created dashboard with functionality of daily, weekly and monthly reports.
Environment: MS-Excel, SSIS,SSRS,MS-SQL Server
Confidential
Junior Java Developer
Responsibilities:
- Analyzed the system and gathered the system requirements.
- Responsible and active in the analysis, design, implementation and deployment of full SoftwareDevelopment Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Developed the web tier using JSP to show account details and summary.
- Used Tomcat web server for development purpose.
- Used Oracle as Database and used Toad for queries execution and involved in writing SQLscripts, PL/SQL code for procedures and functions.
- Developed application using Eclipse.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables toimplement business logic.
- Developed various activities like transaction history, search products that enable users to understand the system efficiently.
- Developed various Java classes and SQL queries to retrieve and manipulate the data.
- Involved in postproduction support and maintenance of the application.
- Involved in the design of the overall database using Entity Relationship diagrams.
- Wrote triggers, menus and stored procedures in PL/SQL.
- Involved in developing interactive forms and customization of screens using Forms 4.5.
- Involved in building, debugging and running forms.
- Involved in Data loading and Extracting functions using SQL*Loader.
- Designed and developed all the tables, views for the system in Oracle.
- Designing and developing forms validation procedures for query and update of data.
Environment: Oracle 11g, SQL*plus, SQL*Loader, PL/SQL, Forms 4.5, Reports 2.5
