Big Data Hadoop Developer Resume Pittsburgh, PA - Hire IT People

SUMMARY:

11 years of extensive experience in IT including Big / Hadoop developer , Master - slave architecture designer and hive/pig developer
Experience on Bigdata engineering and Analytics using Hadoop working environment includes Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Oozie, and Sqoop.
Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.
Expertise in SQL Server 2005/2008 Reporting Services (SSRS), SQL Server 2005 Integration Services (SSIS) in Business Intelligence (BI).
Hands on experience with Cloudera and multi cluster nodes on Hortonworks Sandbox.
Expertise at designing tables in Hive, Mysql using Sqoop and processing data like importing and exporting of databases to the HDFS.
Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of Hadoop, data modeling, machine learning and advanced data processing.
Experience optimizing ETL workflows, where data coming from different sources and it is processed.
ETL Data extraction, managing, aggressions and loading into HBase.
Expertise in developing Pig Latin Script and Hive Query Language.
Extensive knowledge about Zookeeper process for Various types of centralized configurations
Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets and Text files.
Experience in managing and reviewing Hadoop Log files using Flume and Kafka also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
Hands on experience with Spark to handle the streaming data.
Hands on experience with spring tool suit for development of Scala Applications.
Shell Scripting to load the data and process it from various Enterprise Resource Planning (ERP) sources.
Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
Expertise in Hadoop components like Yarn, Pig, Hive, HBase, Flume, Oozie, Shell Scripting like Bash.
Good Understanding of Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
Hands on experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
Working knowledge of Agile and waterfall development models.

TECHNICAL SKILLS:

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases and SQL: MS SQL Server 2012/2008, Oracle 11g (PL/SQL) and MySQL 5.6, MongoDB

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Pittsburgh, PA

Big Data Hadoop Developer

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on them.
Performed custom aggregate functions using Spark SQL and performed interactive querying.
Co-ordination with Hortonworks, development and the operations team on the platform level issues.
Extensively worked on creating combiners, partitioning, distributed cache to improve performance of MapReduce jobs.
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
Used Sqoop transfer data between databases and HDFS and used Kafka to stream the log data from servers.
Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
Familiar with MongoDB write concern to avoid loss of data during system failures.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Extensively performed CRUD operations like put, get, scan, delete, update etc., on HBase database.
Wrote Hive Generic UDF’s to perform business logic operations at table level.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig, Hive, Sqoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Used Hive join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
Worked on various file formats and compression Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
Implemented test scripts to support test driven development and continuous integration.
Scheduling cron jobs for file system check using fsck and wrote shell scripts to generate alerts.
Data scrubbing and processing with Oozie.
Provide Technical support for the Research in Information Technology program
Manage and upgrade Linux and OS X server systems.
Responsible for installation, configurations and management for Linux Systems

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, Conshohocken, PA

Hadoop Developer/Data Engineer

Responsibilities:

Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/ Reduce.Experience in setting up of clusters utilizing cloudera manager.
Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop.
Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
Generated reports from this hive table for visualization purpose.
Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
Understanding of managed distributions of Hadoop, like Cloudera and Hortonworks.
Used Scala to develop Scala coded Spark projects and executed using Spark-submit.
Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
Wrote Flume configuration files for importing streaming log data into HBase withFlume.
Imported several transactional logs from web servers with Flume to ingest the data into
HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
Written and Implemented Teradata Fast load, Multiload and Bteq scripts, DML and
DDL.
Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.
Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
Design and Programming experience in developing Internet Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.

Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Spark, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.

Confidential, Charlotte, NC

DW/ BI Developer

Responsibilities:

Designed the Target Schema definition and ETL Jobs using Data stage.
Used DS Director to view logs and clears logs and validates the job.
Mapping Data Items from Source Systems to the Target System.
Tuning the performance of ETL jobs.
Involved in creating Stored Procedures, views, tables, constraints.
Generated reports from the cubes by connecting to Analysis server from SSRS.
Designed and created Report templates, bar graphs and pie charts.
Modify and enhance existing SSRS reports.

Environment: MS SQL Server Enterprise 2000, Infosphere Datastage 7.5, Oracle, XML, Unix Shell Script.

Confidential, Wayne, PA

SQL BI Developer

Responsibilities:

Worked with client to understand and analyze business requirements to provide the possible technical solutions.
Review and modify software programs to ensure technical accuracy & reliability of programs.
Translate business requirements into software applications and models.
Worked with database objects such as tables, views, synonyms, sequences and database links as well as custom packages tailored to business requirements.
Built complex queries using SQL and wrote stored procedures using PL/SQL.
Used Bulk Collections, Indexes, and Materialized Views to improve the query executions.
Ensure compliance of standards and conventions in developing programs.
Created SQL scripts for conversion of legacy data (including validations) and then load it into the tables.
Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, Joins and Other SQL code to implement business rules.
Performed System Acceptance Testing (SAT) and User Acceptance Testing (UAT) for databases. Performed unit testing and path testing for application.
Analyzed available data from MS Excel, MS Access and SQL server.
Involved in implementing the data integrity validation checks.
Resolve and troubleshoot complex issues.

Environment: MS SQL Server 2005/2008, Windows 2003/2008, SSIS, SQL Server, Management Studio, SQL Server Business Intelligence studio, SQL Profiler, Microsoft Excel and Access.

Confidential, OH

SQL Developer

Responsibilities:

Accountable for the business definition and integration of source systems with the data warehouse to complete consolidation of vendor funds and required reporting for retail across the enterprise.
Researched existing systems and processes, developed / documented business requirements and held meetings with Finance, IT and Retail Operations teams to complete the development schedule.
Designed and developed Summary/ Aggregate tables using procedures and views from SQL Server
Understanding the Reporting design requirements from the Architects and Business Analyst.
Interact with SME’s and End Users for requirement gathering, process improvement, problem identification, project analysis & review meetings and progress reporting.
Designed SSIS package for automatic data migration to transform Data from SQL Server 2000 to SQL server 2008.
Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
Created several Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server Analysis Services (SSAS)-2008/2005.
Well versed in defining/creating and handling Data Sources, Data Source views and parameterized Reports in SSRS 2008.
Designing OLAP cubes, Data Modeling, Dashboard Reports, and scorecards according to the business requirement.
Served as technical expert guiding choices to implement analytical and reporting solutions for complex financial scenarios.
Created/Updated Stored Procedures, Triggers, Functions, Views, and Indexes with extensive use of T-SQL.
Design and develop SSIS (ETL) packages for loading data from Oracle and Flat files (3GB) to SQL Server Database.

Environment: SQL Server 2005/2008,SQL BI Suite (SSIS, SSAS, SSRS), Enterprise manager, PPS, XML, MS PowerPoint, MS Project, MS Access 2003 & Windows Server 2003.

We provide IT Staff Augmentation Services!

Big Data Hadoop Developer Resume

Pittsburgh, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship