We provide IT Staff Augmentation Services!

Senior Big Data Analyst Resume

0/5 (Submit Your Rating)

Bethlehem, PA

SUMMARY:

  • Overall 9+years of IT experience with clients across different industries and involved in all phases of SDLC in different projects, including 4+ years in big data.
  • Hands on experience as Hadoop Architect of versions 1x, 2x and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts along with Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Map Reduce framework and NoSql databases like HBase
  • Experienced in developing BI reports and dashboards using Pentaho Reports and Pentaho Dashboards.
  • Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets
  • Worked in data formats such as TextFile, Sequence File, Row Columnar and Optimized Row Columnar, Parquet in HDFS
  • Experience in developing Data Warehouse architecture and Data Lake
  • Partitioned and Bucketed data sets in Apache Hive to improve performance
  • Managed and Scheduled jobs on Hadoop cluster using ApacheOozie
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Willing to work on weekends in rotation basis.
  • Experienced with Shell, Perl and Python scripting on Linux, AIX and Windows Platforms
  • Excellent experience in developing Web Services withPythonprogramming language.
  • Worked with NoSql database HBase to retrieve data in sparse datasets
  • Experience in creatingSparkContexts,SparkSQL Contexts,SparkStreaming Context to process huge sets of data
  • Experience in performing SQL and hive operations usingSparkSQL
  • Performed real time analytics on streaming data usingSparkStreaming
  • Created Kafka Topics and distributed to different consumer applications
  • Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
  • Developed applications using Java, RDBMS and UNIX Shell scripting, Python
  • Experience in Scala's FP, Case Classes, Traits and leveraged Scala to codeSparkapplication.
  • Responsible for building out and improving the reliability and performance of cloud applications and cloud infrastructure deployed on Amazon Web Services.
  • Developed shellscripts, python scripts to check the health of Hadoop Daemons and schedule jobs
  • Good noledge in Azure cloud services, Azure storage,
  • Involved in core data pipeline code, involving work in Java, C++ and Python, and built on Apache Kafka, Apache Storm
  • Server migration using cloud servers like AWS from physical to cloud environment by using various AWS features like EC2, S3, Autoscaling, RDS, ELB, EBS, IAM, Route 53 for installing, configuring, deploying and troubleshooting on various Amazon images.
  • Web Application developer Java & Python with 3 years of academic and professional experience developing client - side interfaces using JavaScript, JQuery, HTML5 and CSS3
  • Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies dat include EJB, JSP, Servlets, Struts II, JMS, JDBC, JAX-WS, JPA HTML, XML, XSL, XSLT, Java Script, Spring and Hibernate
  • Experience in Web application development using Java, Servlets, JSP, JSTL, Java Beans, EJB, JNDI, JDBC, DHTML, CSS, PHP and AJAX.
  • Expertise in using J2EE Application Servers like Web Logic 8.1/9.2, IBM Web Sphere 7.x/6.x and Web Servers like Tomcat 5.x/6.x
  • Working noledge of Software Design Patterns, Big Data Technologies (Hadoop, Horton works Sandbox)and Cloud Technologies& design
  • Experienced in using Agile software methodology (scrum)
  • Designed Use Case diagrams, Class diagrams, Activity diagram, Sequence diagrams, Flow Charts, and deployment diagrams using Rational Rose Tool
  • Experience with IDE's like Eclipse, Net Beans, RAD, and JBuilder for developing J2EE/JAVA applications
  • Experience with design Patterns like MVC, Singleton, Factory, Proxy, DAO, Abstract, Prototype and Adaptor
  • Proficient in writing and handling SQL Queries, Stored Procedures, and triggers
  • Hands on experience in noledge of user acceptance, Black Box, White box and Unit testing
  • Knowledge of multi vendor operating systems including Linux, Windows and UNIX Shell Script

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS,HBase,Hadoop Map Reduce, Pig, Hive Sqoop, Spark, Scala, Kafka, Strom, Oozie, Zookeeper, Cassandra

Language: C, C++, Java, Python,Ruby, MySQL, SQL Server, MongoDB

Databases: Oracle 10g, DB2, MySQL, SQL Server, MongoDB, Talend

Web Technologies: HTML, Java Script, XML, ODBC, JDBC, MVC, Ajax, JSP, Servlets, Struts, IDE/Testing Tools Eclipse, AWS

Operating Systems: UNIX, Windows 9/7/XP/2000/NT/ME/98

Version Controller Tools: SVN, GIT

MS Software Packages: Ms Office, MS Excel, MS Access

ETL Tools: Informatica, Talend and Pentaho

PROFESSIONAL EXPERIENCE:

Confidential

Senior Big Data Analyst

Responsibilities:

  • Enhanced the data comparison tool, X-Raywhich compares the data between two data warehouses.
  • The X-ray tool was written in Python and the enhancements were made on top of it using python.
  • Python data frames are used to load the data from the tables.
  • Made enhancements like generating query on fly, hash value generations and comparing the hash values instead record vs record comparison.
  • Developed and implemented core API services using Spark with Python.
  • Replaced existing MapReduce jobs using Spark Context, Spark-SQL, Data Frames and Pair RDDs.
  • Used the RegEx, JSON and Avro SerDe for serialization and deserialization packaged with Hive to parse the contents of streamed log data.
  • Involved in creating Hive tables and loading, analyzing data using hive queries.
  • Involved in performance tuning of Hive Queries by implementing Dynamic Partitions, buckets in Hive to improve the performance.
  • Integrated Hive and HBase for better performing MapReduce Jobs.
  • Automated all the jobs, for pulling the data from server to load data on daily basis into Hive tables, using Oozie workflows.
  • Used GitHub to check in the code which updates the tool which is hosted on EC2 server.
  • Used Amazon Web Services like EC2 to host the tool and S3 bucket to store the results.
  • Used Airflow UI to execute the DAGs which works as a UI version to execute the tool.
  • Involved in developing a Jenkins pipeline to trigger the flow of tasks.
  • Experienced working on Teradata Database.
  • Worked with Snowflake Database.
  • Enhanced Shell Script which loads the data into Teradata Database.
  • Used Informatica which is used to transform the data from BW to Teradata Database.
  • Has a good exposure to Version One, an issue tracking tool for Agile Methodologies.
  • Experience in using TPT script, which is used for transporting data parallelly from S3 to Teradata.
  • Used Autosys, a tool used to schedule the jobs.
  • Worked with EXPORT FRAMEWORK.
  • X-Ray is developed to overcome the manual validation of data for Lift and Shift project in which the data is moved from BW toTeradata and Snowflake.
  • The first version of X-Rayhas few complications like having tool on local machine and executing on local machine.
  • Also, user must write one query by handpicking the column names from data warehouses.
  • As an improvement to dis version, the new version of X-Ray uses an EC2 instance where the tool is hosted and the DAGs are executed to execute the tool.
  • When the code is checked into the GitHub, the Jenkins pipeline hosts these changes to EC2 machine.
  • With dis new enhanced X-Ray tool, user can validate the data for n-number of tables at one shot.
  • Using the airflow script, the DAG is generated on Airflow, which when run, executes the tool on EC2.
  • The tool when executed, queries against Teradata and Snowflake for which the hash values are returned as a result set into two Data Frames.
  • Then the Data Frames are compared to check for similarity.
  • The data difference if any, is thengenerated as a link to the file on S3 bucket via e-mail.

Environment: Python, PyCharm, GitHub, Git Desktop, Airflow, Amazon EC2, Amazon S3, Jenkins, Shell Script, Teradata, Snowflake, Export Framework, Informatica, Version One, Sqoop, Putty, Autosys

Confidential, Bethlehem, PA

Hadoop Developer

Responsibilities:

  • Imported data to HDFS from MySQL and exported data from HDFS to MySQL data, using ApacheSqoop
  • Modified and Optimized databases to speed up importing to HDFS
  • Performed data analysis of online secure data by importing data to HDFS using Apache Flume
  • Used SQOOP to import Teradata data to HDFS
  • Experience in deploying Hadoop 2.0(YARN).
  • Extracted, modified and loaded data from files, MySQL, Oracle and other input sources to load data into HDFS
  • DevelopedPythonMapper and Reducer scripts and implemented them using Hadoop streaming
  • Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
  • Migrated Business Critical Applications to AWS Cloud before the deadlines.
  • Develop, tested, documented, and implemented Security Policies for AWS Cloud.
  • Provided cloud brokering services across multiple Tier 1 Cloud Providers: Microsoft Azure and AWS.
  • Experienced working on Pentaho suite (Pentaho Data Integration, Pentaho BI Server, Pentaho Meta Data and Pentaho Analysis Tool).
  • Migrating servers, databases, and applications from on-premise to AWS, Azure and Google Cloud Platform
  • Troubleshooting, Manage and review data backups, Manage & reviewHadooplog files.
  • Experience in importing data to HIVE using Sqoop/Talend Studio
  • Has good exposure to creating mapping in Talend between the source Oracle DBs and Target HIVE tables
  • Cleaned data and preprocessed data using MapReduce for efficient data analysis
  • Used Scala and Java to develop MapReduce programs for data cleansing and analysis
  • Developed custom UDFs using Apache Hive to manipulate data sets
  • Created Hive Compact/ Bitmap Indexes to speed up the processing of data
  • Created/Inserted/Updated Tables in Hive using DDL, DML commands
  • Improved performance of datasets for querying through
  • Worked with Hive file formats such as ORC, sequence file, text file partitions and bucketsto load data in tables and perform queries
  • Used Pig Custom Loaders to load different from data file types such as XML, JSON and CSV
  • Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
  • Monitoring systems and services, architecture design and implementation ofhadoopdeployment, configuration management, backup, and disaster recovery systems and procedures.
  • Scheduled workflow of jobs using Oozie to perform sequential and parallel processing
  • Worked on NoSql database HBase to perform operations on sparse data set
  • Developed shell scripts, python scripts to check the health of Hadoop Daemons and schedule jobs
  • Knowledge on Pentaho Data Integration.
  • Integrated Hive with HBase to upload data and perform row level operations
  • Experienced in creating SparkContext and performing RDD transformations and actions using Python API
  • Used SparkContext to create RDDs to use incoming data to perform Spark Transformations and Actions
  • Created Spark SQLContext to load data from Parquet, JSON files and perform SQL queries
  • Created data frames out of text files to execute SparkSQL queries
  • Used Spark's enable Hive Support to execute Hive queries in Spark
  • Created DStreams on incoming data using createstream
  • Developed Spark streaming applications to work with data generated by sensors in real time
  • Linked Kafka and Flume to Spark by adding dependencies for data ingestion
  • Performed data extraction, aggregation, log analysis on real time data using Spark Streaming
  • Created Broadcast and Accumulator variables to share data across nodes
  • Used case classes, higher order functions, collections of Scala to apply map transformations on RDDs
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
  • Leveraged option monad with Some and None in Scala to avoid null pointer exceptions
  • Implemented Pattern matching in Scala to identify the desired sensor type for performing analysis
  • Developed ScalaTraits to reuse code in other classes

Environment: HDFS, MapReduce, Hive, Azure, HBase, Pig, Java, AWS, Python, Oozie Scala, Kafka, Spark, Git, Maven, Talend, Pentaho, Putty, CentOS 6.4, SBT

Confidential, Livonia, MI

Sr. Hadoop Developer

Responsibilities:

  • Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
  • Implemented helper classes dat access HBase directly from java using Java API to perform CRUD operations.
  • Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Performed debugging and fine tuning in Hive & Pig for improving performance.
  • Used Oozie operational services for batch processing and scheduling workflows dynamically.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Performed Map side joins on data in Hive to explore business insights.
  • Involved in forecast based on the present results and insights derived from data analysis.
  • Integrated Map Reduce with Hbaseto import bulk amount of data into HBase using Map Reduce Programs.
  • Built application logic usingPython and worked on event-driven programming inPython.
  • Pull information from Jira using REST API andPythonto populate excel files for management reports
  • Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
  • Participated in team discussions to develop useful insights from big data processing results.
  • Suggested trends to the higher management based on social media data.

Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Git, Maven, Talend, Putty, REST,CentOS 6.3

Confidential, NY

Hadoop Developer

Responsibilities:

  • Wrote PIGscripts using various input and output formats. Also designed custom format as per the business requirements.
  • Used SQOOP to dump data from MySQL relational database into HDFS for processing and exporting data to RDMS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and testing the classifier using MapReduce, Pig and Hive jobs.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like, Pig, Hive, and Sqoop) as well as system specific jobs (such as Perl and shell script).
  • Automated all the jobs, for pulling data from relational databases to load data into Hive tables, using Oozie workflows and enabled email alerts on any failure cases.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters
  • Workedon SPARK engine creating batch jobs with incremental loadthrough STORM, KAFKA, SPLUNK, FLUME.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system
  • Used SparkSQL for Scala&Python interface dat automatically converts RDD case classes to schema RDD
  • Tool monitored log input from several datacenters, via Spark Stream, was analyzed in Apache Storm and data was parsed and saved into Database.
  • Used tools like SQOOP, Kafka to ingest data into Hadoop
  • Implemented Database access through JDBC at Server end with Oracle.
  • Used Spring Aspect Oriented Programming (AOP) for addressing cross cutting concerns.
  • Developed request/response paradigm by using Spring Controllers, Inversion of Control and Dependency Injection with Spring MVC.
  • Used CVS for version control and Log4j for logging.
  • Used Pig and Hive in the analysis of data.
  • Extracted files from NoSQL database like Cassandra using Sqoop.
  • Worked with Flume to import the log data from the reaper logs and syslog's into the Hadoop cluster.
  • Used complex data types like bags, tuples, and maps in Pig for handling data.
  • Created/modified UDF and UDAFs for Hive whenever necessary.
  • Involved in managing running and pending tasks Map Reduce through Cloudera manager console.
  • Developed PigUDFs for preprocessing thee data for analysis.
  • Involved in writing shell scripts for scheduling and automation of tasks.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Hands on experience with NoSQL databases like HBase, Cassandra for POC (proof of concept) in storing
  • URL's, images, products and supplements information at real time.
  • Worked on Hive for analysis and generating transforming files from different analytical formats to text files.
  • Used Hue for UI based PIG script execution, Oozie scheduling
  • Involved in writing Hivequeries for data analysis with respect to business requirements.
  • Also assisted admin team in installation and configuration of additional nodes in Hadoop cluster

Environment: Apache Hadoop (Gen 1), Hive, Pig, Sqoop, Oozie, HBase, Map-Reduce(MR1), Cloudera, HDFS, Flume, Hue, Linux, HTML5 & CSS3, Hadoop2.2, jQuery, Maven, MongoDB, Java, JDK1.6, J2EE, JDBC, Spring 2.0, Hibernate 4.2.

Confidential

J2EE/Java Developer

Responsibilities:

  • Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
  • Responsible for designing and implementing the web tier of the application from inception to completion using J2EE technologies such as MVC framework, Servlets, JavaBeans, JSP.
  • Developed the application using Struts Framework dat leverages classical Model View Layer (MVC Model2) architecture.
  • Implemented Business processes such as user authentication, Account Transfer using Session EJB.
  • Implemented Hibernate for O/R mapping and persistence.
  • Worked on Creative Suite 3 and Creative Suite 4 for creating websites and presentations.
  • Involved in the components styling (CSS) and skinning.
  • Involved in multi-tiered J2EE design utilizing Spring IOC and Hibernate deployed on WebSphere Application Server connecting to DB2 database.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Developed JUnit test cases for all the developed modules.
  • Extensively used DB2 Database to support the SQL.
  • Used CVS for version control across common source code used by developers.
  • Used Log4J to capture the log dat includes runtime exceptions.
  • Used JDBC to invoke Stored Procedures and database connectivity.
  • Responsible for data reconciliation with EOD files using scheduled batch process.
  • Responsible for system development using J2EE architecture.
  • Used Spring Framework for dependency injection, transaction management and AOP.
  • Involved in Springs MVC model integration for front-end request action controller.
  • Developed by utilizing Spring, Hibernate, Struts, Oracle, JPA, JQuery, Java Script, Spring core.
  • Used Spring ORM support, Hibernate for development of DAO layer.
  • Involved in implementing the DAO pattern for database connectivity and Hibernate.
  • Written SQL queries and did modifications to existing database structure as required for addition of new features.
  • Involved in designing the database and developed StoredProcedures, triggers using PL/SQL.
  • Conducted database and code tuning to improve performance of the application, used Bulk binds, in-line queries, Dynamic SQL, Analytics and Sub-query factoring etc.
  • Involved in the JMS Connection Pool and the implementation of publish and subscribe using SpringJMS.
  • Used JMS Template to publish and Message Driven POJO (MDP) to subscribe from the JMS provider

Environment: Windows, IBM WebSphere Application Server, Eclipse, Spring, Hibernate 3.0, Struts 1.2, EJB, DB2, Java 1.4/J2EE, JDBC, JSP, JSF, JavaScript, HTML, CSS, DHTML, AJAX, CS4, EJB, JDBC, JNDI 1.2, DOM, JMS 1.0.1, XML, Web Services, POJO, DOM, ANT, Rational Rose Apache Axis, WSDL, PL/SQL, LOG4J, CVS

Confidential

Java Developer

Responsibilities:

  • Generated object relational mapping (ORMs) using XML for Java classes and databases.
  • Used Eclipse platform to design and code in J2EE stack.
  • Developed user interfaces using JSP, HTML, JavaScriptand XML.
  • Designed and developed an enterprise common logging around Log4j with a centralized log support (used logger info, error and debug).
  • Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object(DTO), Data Access Object(DAO) and Service Locator.
  • Application of JQuery/JS for responsive GUI.
  • Setting up distributed environment and deploying application on distributed system.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Used Spring Framework AOP Module to implement Logging in the application to no the application status XML Parsing/Domain.
  • Developed JUnit test cases, validated users input using regular expressions in JavaScript as well as in the server side.
  • Used JDBC to connect the web applications to Databases.
  • Used parsers like SAX and DOM for parsing xml documents and used XML transformations using XSLT.
  • Designed REST APIs dat allow sophisticated, TEMPeffective and low cost application integration.
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Gained Knowledge in building sophisticated distributed systems using REST/hypermedia web APIs (SOA) and developed POCs.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, Web Logic 8.0, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, Log4j, Eclipse 6.0.

We'd love your feedback!