We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

ChicagO

SUMMARY

  • Over 8+ years of IT experience in analysis, design, and development using Big data - HadoopTechnology likeAWS, Cloudera, Hortonworks,Hive, Impala, Sqoop, Ozzie, Kafka, HBase, Avro, MapReduce, Data Processing & Data Lake Architectural, Java J2EE, SQL. Apache Hadoop components such as PIG, Scala, Spark, Impala, OOZIE, Flume, HCatalog.
  • Proficient inOraclePackages, Procedures, Functions, Trigger, Views, SQL Loader, Performance Tuning, UNIXShell Scripting,Data Architecture.
  • Proficient inProject Management,ProjectTechnicalDelivery, Estimation & Planning, Coding & Execution & Status Reporting activities.
  • Proficient in Requirement Analysis which includes AS-IS process & TO-BE process and documentation of all the business processes, technical requirements, and as a whole the entire project.
  • FLUME in collecting the data and populate Hadoop.
  • Architected, Designed and DevelopedBigDatasolutions for various implementations.
  • Worked onDataModelling using various ML (Machine Learning Algorithms) via R and Python (Graphlab)WorkedonProgrammingLanguageslike Core Java andScala.
  • Worked on HBase in conducting the quick look ups such as updates, inserts, and deletes in Hadoop.
  • Experience in Data modeling, complex data structures, Data processing, Data quality, Data life cycle.
  • Knowledge on administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
  • A very good experience in developing and deploying the applications using Web logic, Apache Tomcat, and JBoss.
  • Strong Experience with SQL, PL/SQL, and the database concepts.
  • Experience on NoSQL Databases such as Hbase and Casandra.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and Control M.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
  • Good knowledge on YARN (MRV2) architecture.
  • Experience in running Map-Reduce and Spark jobs over YARN.
  • Hands-on experience in complete project life cycle (design, development, testing, and implementation) of Client Server and Web applications
  • Participated in design reviews, code reviews, unit testing and integration testing.
  • Worked on HDFS, NAME NODE, JOB TRACKER, DATA NODE, TASK TRACKER and the Map-Reduce concepts.
  • Experience in writing UNIX shell scripting.
  • Worked on analyzing the data using HiveQL, Pig Latin, and the Map-Reduce programs in Java.
  • Worked on writing custom UDF'S in extending Hive and Pig core functionality.
  • Worked on managing and reviewing Hadoop log files.
  • Worked on Sqoop, in moving the data from a relational database into Hadoop and used
  • Hands on experience and Good Knowledge on real time data feeding platform-KAFKA, integration software like Talend,
  • Good Knowledge on NOSQL databases like MongoDB, HBase, and Cassandra.
  • Experience in using Sqoop to import data and export data into HDFS from RDBMS.
  • Solid understanding of open source monitoring tool: Cloudera Manager.
  • Experience in using and understanding of Pig, Hive and HBase and Hive Built-in functions and Hive partitioning, bucketing and perform different types of joins on Hive tables and HDFS Designs, Daemons, HDFS high availability (HA).
  • Familiar with data warehousing and ETL tools like Informatica.
  • Familiar with Core Java with a strong understanding and working knowledge of Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, JSP, Servlets, Multi-Threading, JDBC, HTML.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent, result oriented with problem-solving as well maintaining the leadership skills and ability to work well with people and to maintain a good relationship with the organization

TECHNICAL SKILLS

Hadoop/Big Data, Teradata Technologies: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Hortonworks, Kafka, Avro, BigQuery. ActiveMQ

Languages: Java, python, NumPy, XML, HTML and HiveQL.

J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.

Frameworks: Spring 2, Struts 2 and Hibernate 3.

XML Processing: JAXB

Reporting Tools: BIRT 2.2., SSRS

Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.

Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.

Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2, MS SQL

IDE: Eclipse and Edit plus.

PM Tools: MS MPP, Risk Management, ESA

Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.

EAI Tools: TIBCO 5.6.

Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.

Operating System: Windows 98/2000, Linux /Unix and Mac.

PROFESSIONAL EXPERIENCE

Confidential, Chicago

Hadoop/Big Data Developer

Responsibilities:

  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on Installing and configuring the HDP HortonWork 2.X and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments.
  • Volumetric Analysis for 43 feeds (Current Approximate Size ofData(70TB), Basedonwhichsize ofProduction Clusteristobe decided.
  • Multiple Spark Jobs were written to performDataQuality checks ondata before files were moved toDataProcessing Layer.
  • Worked on Capacity planning for the Production Cluster.
  • Installed HUE Browser.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
  • Worked on Installation of HORTONWORKS 2.1 in AZURE Linux Servers.
  • Worked on cluster up gradation in Hadoop from HDP 2.1 to HDP 2.3.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure
  • Managed and reviewed Hadoop log files.
  • Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
  • Worked on indexing the HBase tables using Solr and indexing the Json data and Nested data.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to another environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Ambari.
  • Collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades.
  • Monitored workload, job performance, and capacity planning
  • Involved in Analyzing system failures, identifying root causes, and recommended a course of actions.
  • Creating and deploying a corresponding Solr Cloud collection.
  • Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.

Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, DataProcessing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Cloudera, Lily HBase, Cron.

Confidential

Hadoop/Big Data Developer

Responsibilities:

  • DrivingDigitalProductsinthebankforIOTforcampaigningsystem,BlockchainforpaymentandTradingetc.
  • Defining Architecture Standards,BigdataPrinciples, and PADS across Program and usage of VP for Modelling.
  • Developed pig scripts to transform data and loaded into HBase tables.
  • Developed Hive scripts for implementing dynamic partitions
  • Created Hive snapshot tables and Hive ORC tables from Hive tables.
  • In theDataProcessing Layerdatais finally stored in Hive Tables in ORC file formatusingSparkSQL.Inthislayerlogicfor maintainingSCD type2isimplementedfor non-transactionalincrementalfeeds.
  • Development ofaRule enginewhichwould further addcolumnstoexistingdatabased on certain Business Rules specified by ReferenceDataprovided byBusiness.
  • Optimized hive joins for large tables and developed map reduce code for the full outer join of two large tables.
  • Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
  • Experience in using HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
  • Used Talend Big Data Open Studio 5.6.2 to create framework for executing extract framework
  • Monitored workload, job performance, and capacity planning.
  • Implemented partitioning and bucketing techniques in Hive
  • Used different big data components in Talend like throw, thiveInput, tHDFSCopy, tHDFSput, tHDFSGet, tMap, tdenormalize, tFlowtoIterate etc.,
  • Scheduled different talend jobs using TAC (Talend Admin Console).
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like HBase. Developed MapReduce programs to perform data filtering for unstructured data.
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON, and Parquet.
  • Created multi-stage Map-Reduce jobs in Java for ad-hoc purposes
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions
  • Adding/installation of new components and removal of them through Ambari.
  • Collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades.

Environment: Hadoop, MapReduce, TAC, HDFS, HBase, HDP Horton, Sqoop, SparkSQL, Hive ORC, DataProcessing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron, JSON, XML, Parquet.

Confidential

Hadoop/Big Data Developer

Responsibilities:

  • Worked withDataGovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
  • Involvement inTestdatapreparationusing Black box testing Techniques (Like BVA, ECP.)
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Univariate and bivariate analysis to understand the intrinsic effect/combined effects.
  • Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forwarding/Reverse Engineered Databases.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
  • Coded R functions to interface with Caffe Deep Learning Framework
  • Performed data cleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau
  • Creating customized business reports and sharing insights to the management
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Working in Amazon Web Services cloud computing environment
  • Wrote complexHIVEQLand pig scriptsqueries that will pulldataas per the requirement, to performdatavalidation against report o/p.
  • Come up withdataload and security strategies and workflow designs with the help of administrators and other developers.
  • Used Tableau to automatically generate reports.
  • Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
  • Developed automated workflow to schedule the jobs using Oozie
  • Developed a technique to incrementally update HIVE tables (a feature currently not supported by HIVE).
  • Created metrics and executed unit tests on input, output and intermediate data
  • Lead the testing team and meetings with onshore for requirement gathering
  • Assist the team in creating documents that entail the process involved in cluster set up
  • EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, DataProcessing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.

Confidential

Hadoop/Big Data Developer

Responsibilities:

  • Developed multiple Map-Reduce jobs in java for data cleaning and preprocessing.
  • Performed Map Reduce Programs those are running on the cluster.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Configured Hadoop cluster with Name node and slaves and formatted HDFS.
  • Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
  • Performed source data ingestion, cleansing, and transformation in Hadoop.
  • Supported Map-Reduce Programs running on the cluster.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving the performance of existing Pig and Hive Queries.
  • Involved in developing Hive UDFs and reused in some other requirements. Worked on performing Join operations.
  • Developed Serde classes.
  • Develop histograms using R.
  • Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Hive to partition and bucket data.

Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, DataProcessing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.

Hire Now