Bigdata Engineer Resume
Dallas, TX
SUMMARY
- Having 6+ years of experience in IT which includes Analysis, Design, Development of Big Data using Hadoop, design and development of web applications using JAVA, J2EE and data base and data warehousing development using My SQL, Oracle and Informatica.
- Around 5+ years of work experience on Big Data Analytics with hands on experience in installing, configuring and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark, AWS EC2, S3, Auto Scaling, IAM, Lambda, Elastic Load Balancing and other services of AWS.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Experience in using Cloudera Manager for installation and management of single - node and multi-node Hadoop cluster (CDH4&CDH5).
- Experience in Data load management, importing & exporting data using SQOOP & FLUME.
- Experience in analyzing data using Hive, Pig and custom MR programs in Java.
- Experience with Power BI on how to import, shape, and transform data for business intelligence (BI)
- Expertise in AWS Serverless Application Repository to find an application and configured the application by setting environment variables, parameter values and then deployed the application to AWS account and managed it from the AWS management Console.
- Designed an Architecture that involves streaming data, ETL, Batch processing data is modelled and served simultaneously to Azure SQL Data warehouse using Azure Databricks.
- Extensive experience on data migration from RDBMS to HDFS, HBASE and Hive by using ApacheNiFiand SQOOP.
- Worked in Big Data technologies - Hadoop, HDFS, Hive, Oozie, Sqoop, Flume, Pig, HBase, Phoenix, NiFi & Kafka and Apache Spark.
- Experienced with multi-cloud platforms like Amazon Web Services, Azurecloud platform includingfirst andthird-partyintegration servicesconsisting ofDatabricks(both on Azure as well as AWS integration of Databricks).
- Designed and developed multiple interfaces usingDataStage ETLtool. Developed multiple processes, using different kind of adapters (file, database, messages adapters) to integrate different systems.
- Designed and builtAkkacluster driven by RabbitMq and dynamic cluster-aware master-worker actor pairs that validate map data consumed from an AWS rest service.
- Documented logical, physical, relational and dimensionaldatamodels. Designed theDataMarts in dimensionaldatamodeling using star and snowflake schemas.
- Used Akka framework that enables concurrent processing while loading the data lake.
- Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
- Experienced in writing Map Reduce programs & UDF's for both Pig & Hive in java.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Developed Hadoop test classes using MR unit for checking Input and Output.
- Experience with Splunk Searching and Reporting modules.
- Experience in integrating Hive and Hbase for effective operations.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Experience in Impala, Solr, MongoDB, HBase and Spark.
- Hands on knowledge of writing code in Scala.
- ServedasOracle PL/SQLprofessional.
- Experienced in UNIX Shell Scripts to automate load process.
- Developed UNIX scripts for transmitting files from the secure server to the customer specified server, using various FTP batch processes.
- Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
- Used source debuggers and visual development environments.
- Experience in Testing and documenting software for client applications.
- Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
- Good experience in object-oriented design (OOPS) concepts.
- Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
- Good working knowledge on Spring Framework.
- Strong Experience in writing SQL queries.
- Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
- Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark
Programming Languages: Java (5, 6, 7), Python, Scala
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell
ETL Tools: Cassandra, HBASE, ELASTIC SEARCH
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon EC2
PROFESSIONAL EXPERIENCE
Confidential, Dallas, TX
Bigdata Engineer
Responsibilities:
- Designed and implemented data pipelines consisting of launching several Spark clusters equipped with Glue that read the datasets from various data sources and perform transformations, analytics and finally store results to application.
- Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate, transform using spark and load into S3.
- Responsible for providing SQL Engine over the data lake in S3 by adapting Parquet storage format with SparkSQL as SQL engine.
- Wrote various Spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
- Designed number of partitions and replication factor for Kafka topics based on business requirements and worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
- Developed Data Pipelines using AWS lambda for data processing.
- Leading strategic project within the Enterprise Services portfolio. Design and development of data modeling (regression, feature selection, dimension reduction, validation), data (extracting, preparing, munging, validating), and building analytics pipelines in Teradata and Hadoop using Hive and Spark.
- In corporate data science, machine learning, data mining, forecasting, and simulations to offer predictive capabilities to clients.
- Extensively Collaborated with end users to understand reporting requirements and used report writing software (i.e., Tableau) to create a variety of simple to complex reports (e.g., standard/ad hoc reporting, dashboards, scorecards, visualization, and analytics).
- Develop a strategy for system-wide data management activities, promote data warehousing, master data management, data modeling, metadata management activities across both structured and unstructured data assets, as appropriate & warranted by the business needs.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Developed analytical component using Scala, Spark and Spark Stream.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Involved in migration from Livelink to SharePoint using Scala through Restful web service.
- Extensively involved in developing Restful API using JSON library of Play framework.
- Used Scala collection framework to store and process the complex consumer information.
- Used Scala functional programming concepts to develop business logic.
- Designed and implemented Apache Spark Application (Cloudera)
- Importing and exporting data into HDFS Sqoop and Flume and Kafka.
- Troubleshoot and debug Hadoop ecosystem run-time issues.
- Analyzing effected code line objects and design suitable algorithms to address problem.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Schedulersystems to automate the pipeline workflow and orchestrate the map reduce jobs that extract
- Used Zookeeper for providing coordinating services to the cluster.
- Worked with GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Spark, Cloudera CDH4, Oracle, MySQL, Tableau, Talend, Elastic search, Kibana, SFTP.
Confidential, Irving, TX
Hadoop Developer
Responsibilities:
- Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
- Creating, optimizing, updating, and maintaining logical and physical data models for various databases, applications, and systems.
- Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
- Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Lead design reviews of data models and relevant metadata to ensure consistency, quality, accuracy, and integrity.
- Collaborated with database administrators in creating physical data schema from the logical and physical data modelsto ensure compliance with business requirements.
- Strong Knowledge and experience on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming and implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used PySpark and spark-shell accordingly.
- Designed and developed data mapping and transformation scripts to support and promote data warehouse development, structural changes of multiple RDBMS and data analytics efforts as well as design effective ETL logic and code as required.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Defined and governed data modeling and design standards, tools, best practices, and related development methodologies as required.
- ETL Data Cleansing, Integration & Transformation using Hive and PySpark. Responsible for managing data from disparate sources.
- Utilized data modeling tools and associated graphical methods to depict and analyze conceptual, logical, and physical data schemas.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.
- Evaluated data models, business data objects and physical databases for proper usage or re-use of data models in different environments.
Environment: Erwin data modeler, Oracle Designer, Spark, Spark SQL, Spark Streaming, Toad database management toolset, ER/Studio software, Snowflake, DataStage, JIRA, Oracle SQL Developer Data Modeler.
Confidential
Bigdata Developer
Responsibilities:
- Written Map-Reduce code to process all the log files with rules defined in HDFS (as log files generated by different devices have different xml rules).
- Developed and designed application to process data using Spark.
- Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration project.
- Developed and designed system to collect data from multiple portals using Kafka and then process it using spark.
- Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
- Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data for Marketing & Offers platform.
- Developed and designed automate process using shell scripting for data movement and purging.
- Installation & Configuration Management of a small multi node Hadoop cluster.
- Installation and configuration of other open-source software like Pig, Hive, Flume, Sqoop.
- Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Importing and exporting data into Impala, HDFS and Hive using Sqoop.
- Responsible to manage data coming from different sources.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Developed Hive tables to transform, analyze the data in HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Writing the script files for processing data and loading to HDFS.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, Sqoop.
Confidential
Java Developer
Responsibilities:
- Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
- Used SOAP/ REST for the data exchange between the backend and user interface.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes.
- Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
- Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
- Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
- Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
- Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
- Developed authentication through LDAP by JNDI.
- Developed and debugged the application using Eclipse IDE.
- Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
- Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability Confidential .
- Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
- Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.
Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.