Big Data Hadoop Developer Resume
Pittsburgh, PA
SUMMARY:
- 11 years of extensive experience in IT including Big / Hadoop developer , Master - slave architecture designer and hive/pig developer
- Experience on Bigdata engineering and Analytics using Hadoop working environment includes Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Oozie, and Sqoop.
- Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.
- Expertise in SQL Server 2005/2008 Reporting Services (SSRS), SQL Server 2005 Integration Services (SSIS) in Business Intelligence (BI).
- Hands on experience with Cloudera and multi cluster nodes on Hortonworks Sandbox.
- Expertise at designing tables in Hive, Mysql using Sqoop and processing data like importing and exporting of databases to the HDFS.
- Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of Hadoop, data modeling, machine learning and advanced data processing.
- Experience optimizing ETL workflows, where data coming from different sources and it is processed.
- ETL Data extraction, managing, aggressions and loading into HBase.
- Expertise in developing Pig Latin Script and Hive Query Language.
- Extensive knowledge about Zookeeper process for Various types of centralized configurations
- Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets and Text files.
- Experience in managing and reviewing Hadoop Log files using Flume and Kafka also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Hands on experience with Spark to handle the streaming data.
- Hands on experience with spring tool suit for development of Scala Applications.
- Shell Scripting to load the data and process it from various Enterprise Resource Planning (ERP) sources.
- Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
- Expertise in Hadoop components like Yarn, Pig, Hive, HBase, Flume, Oozie, Shell Scripting like Bash.
- Good Understanding of Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
- Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Hands on experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Working knowledge of Agile and waterfall development models.
TECHNICAL SKILLS:
Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala
Operating System: Windows, Linux, Unix.
Languages: Java, J2EE, SQL, PYTHON, Scala
Databases and SQL: MS SQL Server 2012/2008, Oracle 11g (PL/SQL) and MySQL 5.6, MongoDB
Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.
Version Tools: GIT, SVN, CVS
IDE: IBM RAD, Eclipse, IntelliJ
Tools: TOAD, SQL Developer, ANT, Log4J
Web Services: WSDL, SOAP.
ETL: Talend ETL, Talend Studio
Web/App Server: UNIX server, Apache Tomcat
PROFESSIONAL EXPERIENCE:
Confidential, Pittsburgh, PA
Big Data Hadoop Developer
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
- Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on them.
- Performed custom aggregate functions using Spark SQL and performed interactive querying.
- Co-ordination with Hortonworks, development and the operations team on the platform level issues.
- Extensively worked on creating combiners, partitioning, distributed cache to improve performance of MapReduce jobs.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
- Used Sqoop transfer data between databases and HDFS and used Kafka to stream the log data from servers.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
- Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
- Familiar with MongoDB write concern to avoid loss of data during system failures.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Extensively performed CRUD operations like put, get, scan, delete, update etc., on HBase database.
- Wrote Hive Generic UDF’s to perform business logic operations at table level.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig, Hive, Sqoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Used Hive join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
- Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
- Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked on various file formats and compression Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
- Implemented test scripts to support test driven development and continuous integration.
- Scheduling cron jobs for file system check using fsck and wrote shell scripts to generate alerts.
- Data scrubbing and processing with Oozie.
- Provide Technical support for the Research in Information Technology program
- Manage and upgrade Linux and OS X server systems.
- Responsible for installation, configurations and management for Linux Systems
Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra
Confidential, Conshohocken, PA
Hadoop Developer/Data Engineer
Responsibilities:
- Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/ Reduce.Experience in setting up of clusters utilizing cloudera manager.
- Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
- Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
- Generated reports from this hive table for visualization purpose.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Understanding of managed distributions of Hadoop, like Cloudera and Hortonworks.
- Used Scala to develop Scala coded Spark projects and executed using Spark-submit.
- Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
- Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
- Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
- Wrote Flume configuration files for importing streaming log data into HBase withFlume.
- Imported several transactional logs from web servers with Flume to ingest the data into
- HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
- Written and Implemented Teradata Fast load, Multiload and Bteq scripts, DML and
- DDL.
- Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.
- Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
- Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
- Design and Programming experience in developing Internet Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Spark, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential, Charlotte, NC
DW/ BI Developer
Responsibilities:
- Designed the Target Schema definition and ETL Jobs using Data stage.
- Used DS Director to view logs and clears logs and validates the job.
- Mapping Data Items from Source Systems to the Target System.
- Tuning the performance of ETL jobs.
- Involved in creating Stored Procedures, views, tables, constraints.
- Generated reports from the cubes by connecting to Analysis server from SSRS.
- Designed and created Report templates, bar graphs and pie charts.
- Modify and enhance existing SSRS reports.
Environment: MS SQL Server Enterprise 2000, Infosphere Datastage 7.5, Oracle, XML, Unix Shell Script.
Confidential, Wayne, PA
SQL BI Developer
Responsibilities:
- Worked with client to understand and analyze business requirements to provide the possible technical solutions.
- Review and modify software programs to ensure technical accuracy & reliability of programs.
- Translate business requirements into software applications and models.
- Worked with database objects such as tables, views, synonyms, sequences and database links as well as custom packages tailored to business requirements.
- Built complex queries using SQL and wrote stored procedures using PL/SQL.
- Used Bulk Collections, Indexes, and Materialized Views to improve the query executions.
- Ensure compliance of standards and conventions in developing programs.
- Created SQL scripts for conversion of legacy data (including validations) and then load it into the tables.
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, Joins and Other SQL code to implement business rules.
- Performed System Acceptance Testing (SAT) and User Acceptance Testing (UAT) for databases. Performed unit testing and path testing for application.
- Analyzed available data from MS Excel, MS Access and SQL server.
- Involved in implementing the data integrity validation checks.
- Resolve and troubleshoot complex issues.
Environment: MS SQL Server 2005/2008, Windows 2003/2008, SSIS, SQL Server, Management Studio, SQL Server Business Intelligence studio, SQL Profiler, Microsoft Excel and Access.
Confidential, OH
SQL Developer
Responsibilities:
- Accountable for the business definition and integration of source systems with the data warehouse to complete consolidation of vendor funds and required reporting for retail across the enterprise.
- Researched existing systems and processes, developed / documented business requirements and held meetings with Finance, IT and Retail Operations teams to complete the development schedule.
- Designed and developed Summary/ Aggregate tables using procedures and views from SQL Server
- Understanding the Reporting design requirements from the Architects and Business Analyst.
- Interact with SME’s and End Users for requirement gathering, process improvement, problem identification, project analysis & review meetings and progress reporting.
- Designed SSIS package for automatic data migration to transform Data from SQL Server 2000 to SQL server 2008.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Created several Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server Analysis Services (SSAS)-2008/2005.
- Well versed in defining/creating and handling Data Sources, Data Source views and parameterized Reports in SSRS 2008.
- Designing OLAP cubes, Data Modeling, Dashboard Reports, and scorecards according to the business requirement.
- Served as technical expert guiding choices to implement analytical and reporting solutions for complex financial scenarios.
- Created/Updated Stored Procedures, Triggers, Functions, Views, and Indexes with extensive use of T-SQL.
- Design and develop SSIS (ETL) packages for loading data from Oracle and Flat files (3GB) to SQL Server Database.
Environment: SQL Server 2005/2008,SQL BI Suite (SSIS, SSAS, SSRS), Enterprise manager, PPS, XML, MS PowerPoint, MS Project, MS Access 2003 & Windows Server 2003.